Clustering based Two-Stage Text Classification Requiring Minimal Training Data

Size: px
Start display at page:

Download "Clustering based Two-Stage Text Classification Requiring Minimal Training Data"

Transcription

1 OI: /CSIS Z Custerng based Two-Stage Text Cassfcaton Requrng Mnma Tranng ata Xue Zhang 1,2 and Wangxn Xao 3,4 1 Key Laboratory of Hgh Confdence Software Technooges, Mnstry of Educaton, Pekng Unversty, Bejng , Chna 1 Schoo of Eectroncs Engneerng and Computer Scence, Pekng Unversty, Bejng , Chna 2 epartment of Physcs, Shangqu Norma Unversty, Shangqu , Chna jane_zhang@pku.edu.cn 3 epartment of Computer Scence, Jnggangshan Unversty, J an , Chna 4 Schoo of Traffc and Transportaton Engneerng, Changsha Unversty of Scence and Technoogy, Changsha , Chna wx.xao@roh.cn Abstract. Custerng has been empoyed to expand tranng data n some sem-supervsed earnng methods. Custerng based methods are based on the assumpton that the earned custers under the gudance of nta tranng data can somewhat characterze the underyng dstrbuton of the data set. However, our experments show that whether such assumpton hods s based on both the separabty of the consdered data set and the sze of the tranng data set. It s often voated on data set of bad separabty, especay when the nta tranng data are too few. In ths case, custerng based methods woud perform worse. In ths paper, we propose a custerng based two-stage text cassfcaton approach to address the above probem. In the frst stage, abeed and unabeed data are frst custered wth the gudance of the abeed data. Then a sef-tranng stye custerng strategy s used to teratvey expand the tranng data under the gudance of an orace or expert. At the second stage, dscrmnatve cassfers can subsequenty be traned wth the expanded abeed data set. Unke other custerng based methods, the proposed custerng strategy can effectvey cope wth data of bad separabty. Furthermore, our proposed framework converts the chaengng probem of sparsey abeed text cassfcaton nto a supervsed one, therefore, supervsed cassfcaton modes, e.g. SVM, can be apped, and technques proposed for supervsed earnng can be used to further mprove the cassfcaton accuracy, such as feature seecton, sampng methods and data edtng or nose fterng. Our expermenta resuts demonstrated the effectveness of our proposed approach especay when the sze of the tranng data set s very sma. Keywords: text cassfcaton, custerng, actve sem-supervsed custerng, two-stage cassfcaton.

2 Xue Zhang and Wangxn Xao 1. Introducton The goa of automatc text cassfcaton s to automatcay assgn documents to a number of predefned categores. It s of great mportance due to the ever-expandng amount of text documents avaabe n dgta form n many rea-word appcatons, such as web-page cassfcaton and recommendaton, ema processng and fterng. Text cassfcaton has once been consdered as a supervsed earnng task, and a arge number of supervsed earnng agorthms have been deveoped, such as Support Vector Machnes (SVM) [1], Naïve Bayes [2], Nearest Neghbor [3], and Neura Networks [4]. A comparatve study was gven n [5]. SVM has been recognzed as one of the most effectve text cassfcaton methods. Furthermore, a number of technques sutabe for supervsed earnng have been proposed to mprove cassfcaton accuracy, such as feature seecton, data edtng or nose fterng, and sampng methods aganst bas. A supervsed cassfcaton mode often needs a very arge number of tranng data to enabe the cassfer s good generazaton. The cassfcaton accuracy of tradtona supervsed text cassfcaton agorthms degrades dramatcay wth the decrease of the number of tranng data n each cass. As we know, manuay abeng the tranng data for a machne earnng agorthm s a tedous and tme-consumng process, and even unpractca (e.g., onne web-page recommendaton). Correspondngy, one mportant chaenge for automatc text cassfcaton s how to reduce the number of abeed documents that are requred for budng reabe text cassfer. Ths eads to an actve research probem, sem-supervsed earnng. There have been proposed a number of sem-supervsed text cassfcaton methods, ncudng Transductve SVM (TSVM) [6], Co-Tranng [7] and EM [8]. A comprehensve revew coud be found n [9]. By exporng nformaton contaned n unabeed data, these methods obtan consderabe mprovement over supervsed methods wth reatvey sma sze of tranng data set. However, most of these methods adopt the teratve approach whch tran an nta cassfer based on the dstrbuton of the abeed data. They st face dffcutes when the abeed data set s extremey sma snce they w have a poor startng pont and cumuate more errors n teratons when the extremey few abeed data are far apart from correspondng cass centers due to the hgh dmensonaty. To address the probem of sparsey abeed text cassfcaton, we present a custerng based two-stage text cassfcaton method wth both abeed and unabeed data. Expermenta resuts on severa rea-word data sets vadate the effectveness of our proposed approach. Our contrbutons can be summarzed as foows. We propose a nove custerng based two-stage cassfcaton approach that requres mnma tranng data to acheve hgh cassfcaton accuracy. In order to mprove the accuracy of the sef-abeed tranng data by custerng, we propose an actve sem-supervsed custerng method to cope wth data sets of bad separabty. On the bass of custerng, we convert the chaengng probem of sparsey abeed text cassfcaton nto supervsed one. Thus supervsed 1628 ComSIS Vo. 9, No. 4, Speca Issue, ecember 2012

3 Custerng based Two-Stage Text Cassfcaton Requrng Mnma Tranng ata cassfcaton modes and technques sutabe for text cassfcaton can be used to further mprove the overa performance. We conduct extensve experments to vadate our approach and study reated ssues. The rest of ths paper s organzed as foows. Secton 2 revews severa exstng methods. Our custerng method s gven n Secton 3 wth some anayss. The detaed agorthm s then presented n Secton 4. Expermenta resuts are presented n Secton 5. Secton 6 concudes ths paper. 2. Reated Work Custerng has been apped n many sub-domans of the probem of text cassfcaton, ncudng feature compresson or extracton [10], semsupervsed earnng [11], and custerng n arge-scae cassfcaton probems [12,13]. The foowng w revew severa reated work about custerng adng cassfcaton n the area of sem-supervsed earnng. A comprehensve revew for text cassfcaton aded by custerng can be found n [14]. Custerng has been used to extract nformaton from unabeed data n order to boost the cassfcaton task. There are roughy four cases of sem-supervsed cassfcaton aded by custerng. In partcuary, custerng s used: (1) to create a tranng set from the unabeed set [15], (2) to augment an exstng abeed set wth new documents from the unabeed data set [11], (3) to augment the data set wth new features [8,16], and (4) to co-tran a cassfer [17,18]. More recenty, smutaneous earnng frameworks for custerng and cassfcaton have been proposed [19,20]. To make use of unabeed data, one assumpton whch s made, expcty or mpcty, by most of the sem-supervsed earnng agorthms s the so-caed custer assumpton that two ponts are key to have the same cass abe f there s a path connectng them passng through regons of hgh densty ony. That s, the decson boundary shoud e n regons of ow densty. Based on the deas of spectra custerng and random waks, a framework for constructng kernes whch mpement the custer assumpton was proposed n [21]. Aso based on custer assumpton, [22] apped spectra custerng to represent the abeed and unabeed data. By custerng unabeed data wth abeed data usng probabstc and fuzzy approaches, [23] proposed a framework to mprove the performance of base cassfer wth unabeed data. In text cassfcaton, there are often many ow-densty areas between postve and negatve abeed exampes because of the hgh dmensonaty and data sparseness. Ths stuaton w be worsened wth the decrease of the number of tranng data n each cass. The most reated work s the custerng based text cassfcaton (CBC) approach [11]. In CBC, frsty, sem-supervsed soft k-means s used to custer the abeed and unabeed data nto k custers, where k s set to the number of casses n the cassfcaton task. p% most confdent unabeed exampes from each custer (.e. the ones nearest to the custer s centrod) are added to the ComSIS Vo. 9, No. 4, Speca Issue, ecember

4 Xue Zhang and Wangxn Xao tranng data set. Then TSVM s traned on the augmented tranng data set and unabeed data set. Smary, p% most confdent unabeed exampes from each cass (.e. the ones wth the argest margn) are added to the tranng data set. CBC terates the step of custerng and the step of cassfcaton aternatvey unt there s no unabeed data eft. In CBC, n order to guarantee the abeng accuracy, the vaue of p shoud be sma enough. That s, after the custerng step n each teraton, the tranng data set s augmented wth very few exampes. Therefore, the cassfer n the foowng cassfcaton step shoud have an accepted performance wth sma sze of tranng data set. Ths put a strong constrant on the seected cassfcaton modes. CBC can hardy perform we wth supervsed cassfcaton modes, e.g. SVM, whch w be demonstrated ater. The success of CBC s based on the assumpton that even when some of the data ponts are wrongy cassfed, the most confdent data ponts,.e. the ones wth argest margn under cassfcaton mode and the ones nearest to the centrods under custerng mode, are confdenty cassfed or custered. Ths assumpton guarantees the hgh accuracy of the sef-abeed tranng data and correspondngy the good performance of the agorthm. We separate ths assumpton nto custerng assumpton and cassfcaton assumpton for convenence. However, our emprca experments show that the assumptons are often voated on data sets of bad separabty. Frsty, custerng assumpton can t be hod n ths case, at east for the soft-constrant k-means [11]. In fact, each custer s centrod may ocate n: 1) the doman of ts correspondng true cass, 2) the border of ts true cass and other casses, 3) the doman of other cass. The probabty that the ast two cases occur ncreases wth the degradng of data separabty. In the ast two cases (we ca them custer bas), CBC w ntroduce more nose nto the tranng data set n ts custerng step, whch mght make the cassfcaton assumpton aso be voated snce the nose w have a bg effect due to the ntay very few truy abeed tranng data. Then the foowng teratve steps w further cumuate more errors. In sparsey abeed text cassfcaton, the extremey few tranng data make many technques whch are usefu for ameoratng data separabty, e.g. feature seecton, not effectve, because the tranng data can not characterze the whoe data set we. When the sze of tranng data set s extremey sma, unsupervsed earnng gves better performance than supervsed and sem-supervsed earnng agorthms. In ths paper, we deveop an actve sem-supervsed custerng based two-stage approach to address the probem of sparsey abeed text cassfcaton. fferent from CBC, our am s to convert the probem of sparsey abeed text cassfcaton nto a supervsed one by usng custerng. Therefore, supervsed cassfcaton modes, e.g. SVM, can be apped, and technques proposed for supervsed earnng can be used to further mprove the cassfcaton accuracy, such as feature seecton, sampng methods and data edtng or nose fterng. Furthermore, our proposed actve sem-supervsed custerng method ams to cope wth data sets wth any separabty. The goa of custerng here s to generate enough tranng data for supervsed earnng wth hgh accuracy ComSIS Vo. 9, No. 4, Speca Issue, ecember 2012

5 Custerng based Two-Stage Text Cassfcaton Requrng Mnma Tranng ata 3. Actve Sem-supervsed Custerng Usng custerng to ad sem-supervsed cassfcaton, the key pont es n that the custerng resuts can to some extent characterze the underyng dstrbuton of the whoe data set. Ony n ths case, custerng s hepfu to augment tranng data set or extract usefu features to mprove the performance of cassfcaton. Athough custerng methods are more robust to the bas caused by the nta sparsey abeed data, emprca experences show that the resuts of custerng mght aso be based (e.g. the custer bas of cases 2) and 3)), sometmes heavy, especay on data sets of bad separabty. The soft-constraned k-means n CBC can reduce bas n the abeed exampes by basng the constrants (the gudance of the ntay abeed data) not on exact exampes but on ther centrod. But t st cannot cope wth the bas we n tranng data on data sets wth bad separabty. Tabe 1 gves two custerng based agorthms to augment tranng data. They mpement the teratve renforcement strategy. In each teraton, a custerng method s used to custer the whoe data set wth the gudance of abeed tranng data, and then severa exampes are seected accordng to some crtera and abeed wth the abes of the centrods they beong to. In SemCC agorthm, the custerng method s soft-constraned k-means adopted n CBC agorthm. The custerng method (we ca t as actve soft-constraned k-means) n SemCCAc agorthm s proposed n order to address the custer-bas probem. Tabe 1. Two Custerng Agorthms: SemCC and SemCCAc Input: Labeed data set and unabeed data set u, the number of teratons maxiter, p Output: Augmented abeed data set Intaze: =, u = u, ter=0 Agorthm SemCC: Whe ter<maxiter and u Φ ter=ter+1 Cacuate nta centrods: 1 o x, 1,...,, j, t j c x j n o o. n s the number of exampes n j, and set current centrods whose abe s. The abes of the centrods t( o ) t( ) are equa to abes of the correspondng exampes. o ComSIS Vo. 9, No. 4, Speca Issue, ecember

6 Xue Zhang and Wangxn Xao Repeat unt custer resut doesn t change any more Assgn t( o ) to each x u that are nearer to than to other centrods. Update current centrods: 1 o x j t j c x j u j n, 1,...,,, number of exampes n Cacuate the nearest centrods t( o ) t( ), ext the oop. o whose abe s. From each custer, seect p% exampes nearest to o, add them to u o j for each o, n s the o, f x u whch s, and deete them from u. Agorthm SemCCAc: Whe ter<maxiter and u Φ ter=ter+1 Cacuate nta centrods: 1 o x, 1,...,, j, t j c x j j n, and set current centrods o o. n s the number of exampes n whose abe s. The abes of the centrods t( o ) t( ) are equa to abes of o the correspondng exampes. Repeat unt custer resut doesn t change any more Assgn t ( o ) to each x u that are nearer to o than to other centrods. Update current centrods: o 1 n x j t j, 1,..., c,, j number of exampes n Cacuate the nearest centrods t( o ) t( ), ext the oop. o x j u whose abe s. From each custer, seect p% exampes nearest to confdences, x 1,,x m u o j for each, n s the o, f x u whch s o and sort them wth descendng order of 1632 ComSIS Vo. 9, No. 4, Speca Issue, ecember 2012

7 Custerng based Two-Stage Text Cassfcaton Requrng Mnma Tranng ata If the true abe of x 1 and x m equas to t o ) : ese end add the m exampes to deete them from u add x 1 and x m wth ther true abes to deete x 1 and x m from SemCCAc s dfferent from SemCC n the abeng strategy for the seected exampes of hghest confdences accordng to the custerng resuts. In soft-constraned k-means, t doesn t take the found centrods ocaton nto consderaton. It just abes the seected exampes nearest to each centrod. Therefore, t w ntroduce much nose nto the tranng data set wth the presence of custer bas. In actve soft-constraned k-means, t frst estmates the ocaton of each centrod. Ony for custers whose centrods ocate wthn ther true casses, t abes a the seected exampes nearest to the correspondng centrods wth the abes of ther centrods. For the custers wth the presence of custer bas, t just abes two exampes wth ther true abe for each custer. An mportant probem n actve soft-constraned k-means s how to estmate the ocaton of each custer s centrod. The strategy used here s to nqure the true abes of two exampes (the nearest and the farthest exampes to the centrod n the seected p% exampes) by resortng to an orace or expert for each custer. If the two exampes have the same abe wth that of ther centrod, then a the p% seected exampes are abeed wth the abe of the centrod. Otherwse, ony the two exampes are added to tranng data set wth ther true abes. The strategy s based on the ntuton that custer bas s more key happened when the two exampes have dfferent abes wth that of ther centrod. When the two exampes have the same abe, but dfferent from ther centrod, the custer s centrod s most key ocate n the doman of other casses. When one of the two exampes has the same abe wth that of the centrod, the custer s centrod s most key ocate n the border of the true cass and other casses. We can fter out much nose by usng ths strategy. It s aso deghtfu that custer bas can be rectfed n the foowng teratons n SemCCAc by estmatng the ocaton of custers centrods, whch property guarantees the hgh accuracy of sef-abeed tranng data n spte of the poor startng ponts. u ( ComSIS Vo. 9, No. 4, Speca Issue, ecember

8 Accuracy Xue Zhang and Wangxn Xao SemCCAc(p=0.5) SemCC(p=0.5) SemCCAc(p=1) SemCC(p=1) maxiter Fg.1. Accuracy of sef-abeed tranng data wth teratons In fgure 1, we depct the average accuracy of sef-abeed tranng data by appyng the two custerng agorthms to a text cassfcaton probem n 20 runs (same2, consstng of two most smar casses n 20Newsgroups, 5 tranng data for each cass). From fgure 1, t coud be found that the accuracy of sef-abeed tranng data by SemCCAc s sgnfcanty hgher than that by SemCC. Wth the ncrease of p vaue, the accuracy degrades frst, but then t rses wth the ncrease of teratons. When maxiter=1, SemCC degrades to the soft-constraned k-means. We can aso see that the average accuracy n SemCC s beow 0.95 when maxiter=1, whch ndcates that soft-constraned k-means ntroduces nose nto the tranng data set wth a certan probabty. Ths nose w hurt the foowng cassfer s earnng, especay when the sze of the nta tranng data set s very sma. We thnk the phenomenon of custer bas can partay expan why the performance of CBC mproves sowy than those of TSVM and co-tranng wth the ncrease of tranng data. Wth the ncrease of teratons, the accuracy of sef-abeed tranng data by SemCC degrades much faster than that of SemCCAc. Therefore, technques to cope wth the custer bas are very mportant for custerng based sem-supervsed cassfcaton. Ths aso tes us that the proposed actve sem-supervsed custerng method s effectve for addressng the probem of custer bas ComSIS Vo. 9, No. 4, Speca Issue, ecember 2012

9 Custerng based Two-Stage Text Cassfcaton Requrng Mnma Tranng ata 4. Two-Stage Cassfcaton Framework: ACTC In ths secton, we present the deta of the Actve sem-supervsed Custerng based Two-stage text Cassfcaton agorthm (ACTC). A documents are tokenzed nto terms and we construct one component for each dstnct term. Thus each document s represented by a vector ( w 1, w 2,..., wp) where w j s weghted by TFIF. The cosne functon s used n the custerng agorthm to cacuate the dstance from an exampe to the centrod. In the cassfcaton stage, we use a SVM cassfer traned wth the augmented tranng data to cassfy the whoe data set. The detaed agorthm s presented n tabe 2. ACTC conssts of two stages: custerng stage and cassfcaton stage. In the custerng stage, SemCCAc s used to augment the tranng data set. Users can set the vaues of maxiter and p to determne how many new documents shoud be abeed by SemCCAc. At the second stage, dscrmnatve cassfers can subsequenty be traned wth the expanded abeed data set. Soft-constraned k-means s n fact a generatve cassfer [11]. Accordng to [24], generatve cassfers reach ther asymptotc performance faster than dscrmnatve cassfers, but usuay ead to hgher asymptotc error than dscrmnatve cassfers. Ths motvates us to combne custerng wth dscrmnatve cassfers together to address the probem of sparsey abeed text cassfcaton. ACTC n fact converts the probem of sparsey abeed text cassfcaton nto a supervsed one, thus supervsed cassfcaton modes sutabe for text cassfcaton can be used. Moreover, the technques proposed for supervsed earnng can be used to mprove the performance. For nstance, t s unavodabe to fasey abe some exampes n the custerng stage, then data edtng or nose fterng technques are expected to mprove the performance of ACTC. Other technques aso can be used to mprove the performance, such as feature seecton and sampng. Tabe 2. ACTC and CBCSVM Input: Labeed data set and unabeed data set u and the number of teratons maxiter, p Output: The fu abeed set = + u A cassfer L Agorthm ACTC: 1. Custerng Stage Use SemCCAc (repeat maxiter teratons) to augment the tranng data set and we get an augmented tranng data set 2. Cassfcaton Stage ComSIS Vo. 9, No. 4, Speca Issue, ecember

10 Xue Zhang and Wangxn Xao Tran a SVM cassfer L based on to cassfy the whoe data set.. And use the earned cassfer Agorthm CBCSVM: ter=0 1. whe ter<maxiter/2 ter=ter Custerng step Use soft-constraned k-means to custerng the whoe data set, and seect p% unabeed exampes nearest to ts centrod for each custer and add them to 1.2 Cassfcaton step Tran a SVM cassfer based on. From each cass, seect p% unabeed exampes wth the argest margn, and add them to 2. Tran a SVM cassfer L based on to cassfy the whoe data set.. And use the earned cassfer In order to verfy the two-stage framework performs better than CBC agorthm n supervsed earnng, we substtute SVM for TSVM n CBC, named CBCSVM. Note that, f we use TSVM as the cassfer, the performance of both agorthms w be expected to get mproved. However, the tme compexty of a TSVM cassfer s much hgher than that of a SVM cassfer, because t repeatedy swtches estmated abes of unabeed data and tres to fnd the maxma margn hyperpane. The more unabeed data are, the more tme t requres. The worse of the data separabty s, the more tme t requres. For exampe, on same2, whch conssts of two most smar casses of 20Newsgroups and 1000 exampes n each cass, TSVM requres severa hours to compete when 5 tranng data for each cass are used. SVM ony needs about 1 second. Wth enough tranng data, the performance of SVM s expected to be smar wth that of TSVM, but t requres much ess tme. Ths motvates us to propose the two-stage cassfcaton method, whch converts the probem of sparsey abeed text cassfcaton nto a supervsed one. CBCSVM s aso gven n tabe 2 for convenence. Snce CBC seects p% unabeed exampes both n custerng and cassfcaton steps n each teraton, we set the number of teratons to haf of that n ACTC n order to make them have the same seecton tmes. The dfference between our approach and CBC s that, we expand the tranng data set by a sef-tranng stye custerng process and resortng to an orace or expert to evauate the custers centrods. After competon of the tranng data expanson, dscrmnatve cassfers coud be traned on the expanded tranng data set. Therefore, ACTC puts ess constrant on the 1636 ComSIS Vo. 9, No. 4, Speca Issue, ecember 2012

11 Custerng based Two-Stage Text Cassfcaton Requrng Mnma Tranng ata cassfcaton mode, whch enabes us to treat the foowng cassfcaton stage as a supervsed earnng probem. 5. Performance Evauaton 5.1. ata sets For a consstent evauaton, we conduct our emprca experments on two benchmark data sets, 20NewsGroups and Reuters Newsgroups s one famous Web-reated data coecton. From the orgna 20 Newsgroups data set, same2, consstng of 2 very smar newsgroups (comp.wndows.x, comp.os.ms-wndows) s used to evauate the performance of the agorthms. Same2 contans 2000 nstances, 1000 for each cass. We use Ranbow software 1 to preprocess the data (removng stop words and words whose document frequency are ess than 3, stemmng) and we get 7765 unque terms for same2. Then terms are weghted wth ther TFIF vaues. The Reuters corpus contans Reuters news artces from We ony show the expermenta resuts of tran1.svm n LWE 2 snce the agorthms have the smar performance on other Reuters data sets. Tran1.svm contans 1239 documents (two cass) and 6889 unque terms Evauaton Metrc We use macro-averagng of F1 measure among a casses to evauate the cassfcaton resut. For each cass [ 1, c], et A be the number of documents whose rea abe s, and B the number of documents whose abe s predcted to be, and C the number of correcty predcted documents n ths cass. The precson and reca of the cass are defned as P C / B and R C / A respectvey. For each cass, the F1 metrc s defned as F1 2 P R /( P R) where P and R are precson and reca for a partcuar cass. F1 metrc takes nto account both precson and reca, thus t s a more comprehensve metrc than ether precson or reca when separatey consdered. The macro-averagng F1 s a measurement whch evauates the overa performance of the cassfcaton mode. It s defned as: 1 c Macro _ F1 2 P /( ) 1 R P R c (1) ComSIS Vo. 9, No. 4, Speca Issue, ecember

12 Xue Zhang and Wangxn Xao 5.3. Expermenta Resuts The SVM ght package 3 s used n our experments for the mpementaton of SVM usng defaut confguratons. We frst compare ACTC and CBCSVM wth dfferent teratons on two data sets. SVM and SemCCAc are used as the basene n order to see the benefts brought by our two-stage cassfcaton framework and CBCSVM. We set p=0.5. We conduct the experments 30 runs and the average resuts are gven. The number of tranng data s 5 for each cass and randomy samped n each run. Fgures 2 and 3 gve the Macro_F1 performance wth dfferent teratons on same2 and Reuters respectvey. In ACTC and CBCSVM, parameter maxiter determnes the number of sef-abeed tranng data. Larger vaue of maxiter means more sef-abeed tranng data and arger sze of the tranng data set for the fna SVM tranng. That s, the sze of tranng data set for the fna SVM ncreases wth the ncrease of maxiter. From fgure 2, we can see that ACTC sgnfcanty outperforms the other agorthms wth any vaue of maxiter and ts performance mproves wth the ncrease of the maxiter. Ths ndcates the foowng two aspects. One s that SVM cassfer sgnfcanty benefts from the augmented tranng data set by comparng ts performance wth that of SVM traned on the nta tranng data set. The other s that the sef-abeed tranng data are of hgh accuracy so that the beneft from the sef-abeed tranng data exceeds the negatve effect of the nose contaned n the sef-abeed tranng data. Ths accords wth that shown n fgure 1 n secton 3. The performance of CBCSVM degrades sghty wth the ncrease of maxiter. Because soft-constraned k-means cannot cope wth custer bas we, t ntroduces more nose nto the sef-abeed tranng data whch further put negatve effect on the SVM tranng. Such nose cumuates n the foowng teratons, whch make the fna SVM perform worse than that n ACTC. SemCCAc outperforms SVM, whch accords wth the former concuson that unsupervsed earnng gves better performance than supervsed earnng when the sze of tranng data set s extremey sma ComSIS Vo. 9, No. 4, Speca Issue, ecember 2012

13 Macro-F1 Macro-F1 Custerng based Two-Stage Text Cassfcaton Requrng Mnma Tranng ata 0.95 same2ter SemCCAc ACTC CBCSVM SVM maxiter Fg.2. Performance wth maxiter on same2 0.9 reuters2ter SemCCAc ACTC CBCSVM SVM maxiter Fg.3. Performance wth maxiter on Reuters From fgure 3, we can see that ACTC outperforms the other agorthms when maxiter>20. Ths may es n the fact that the sef-abeed tranng data are unbaanced for each cass n SemCCAc, and that SemCCAc may fter out usefu exampes when t copes wth the custer bas, whch have reatvey arger effect on the fna SVM performance when the sze of tranng data set s sma. Ths s expected to be mproved by exporng sampng technque on tranng data set, e.g. over-sampng. On Reuters data set, SemCCAc ComSIS Vo. 9, No. 4, Speca Issue, ecember

14 Macro-F1 Xue Zhang and Wangxn Xao outperforms CBCSVM sghty and sgnfcanty outperforms SVM. Ths ndcates that custerng gves better performance than that of SVM when the nta tranng data set s sma. To evauate the performance of ACTC wth a arge range of abeed data, we run the agorthm together wth CBCSVM, SVM and SemCCAc on dfferent percentage of the abeed data on the above two data sets. Fgures 4 and 5 gve the resuts. We set p=0.5 and maxiter=60. We conduct the experments 30 runs and the average resuts are gven. Tranng data are randomy samped n each run. 1 same2seeds SemCCAc ACTC CBCSVM SVM Number of tranng data n each cass Fg.4. Performance wth number of tranng data on same2 ACTC performs best on the two data sets wth a sze of tranng data set. SVM performs worst when the sze of tranng data s 5 for each cass. Then ts performance mproves fast wth the ncrease of tranng data. SVM outperforms CBCSVM and SemCCAc when the sze of tranng data set s arger than 20 on same2 and arger than 10 on Reuters. Wth the ncrease of tranng data, the performance of ACTC, CBCSVM, and SemCCAc grows very sowy. For ACTC and CBCSVM, the reason may be due to the effect of nose contaned n the sef-abeed tranng data. Therefore data edtng or nose fterng technques may be hepfu to mprove the performance. After nose fterng, feature seecton and sampng may aso be hepfu to mprove the overa performance. ACTC aways sgnfcanty outperforms CBCSVM and SemCCAc, whch ndcates that our two-stage cassfcaton framework s superor to that of CBC, and that the combnaton of generatve mode wth dscrmnatve mode can overcome the shortcomngs of both modes ComSIS Vo. 9, No. 4, Speca Issue, ecember 2012

15 Macro-F1 Custerng based Two-Stage Text Cassfcaton Requrng Mnma Tranng ata 1 Reuterseeds Number of tranng data n each cass SemCCAc ACTC CBCSVM SVM Fg.5. Performance wth sze of tranng data set on Reuters In ACTC and CBCSVM, the parameter p determnes the number of sef-abeed exampes n each seecton process. Larger vaue of p ndcates more exampes are sef-abeed n each seecton, so fewer teratons are needed when the number of sef-abeng exampes are fxed. But more nose may be ntroduced nto the tranng data set (pease refer to fgure 1). 6. Concuson Ths paper presents an actve sem-supervsed custerng based two-stage cassfcaton framework for sparsey abeed text cassfcaton. In order to address the custer bas probem, an actve sem-supervsed custerng method s proposed. We use a sef-tranng stye custerng method to augment the tranng data set, so that we can convert the chaengng probem of sparsey abeed text cassfcaton nto a supervsed one. Therefore supervsed cassfcaton modes can be used, e.g. SVM, and usefu technques for supervsed earnng can be empoyed to further mprove the performance. The experments show the superor performance of our method over SVM and CBC (SVM as base earner). In the future, we pan to evauate other custerng methods to address the custer bas probem, e.g. affnty propagaton custerng and densty based custerng. In terms of nose contro, data edtng or nose fterng technques w aso be expored. Other drectons ncude nvestgatng the probems of exampe seecton, confdence assessment, and resampng technques. ComSIS Vo. 9, No. 4, Speca Issue, ecember

16 Xue Zhang and Wangxn Xao Acknowedgment. The authors woud ke to thank the anonymous revewers for ther usefu advce. Ths work s partay supported by the Natona Natura Scence Foundaton of Chna (No , No , No , and No ), the speca scentfc research fundng of Research Insttute of Hghway, Mnstry of Transport (No ), and the Project of Educaton epartment of Jangx Provnce (No.GJJ08415). References 1. Joachms, T.: Text categorzaton wth support vector machnes: Learnng wth Many Reevant Features. In Proceedngs of the European Conference on Machne Learnng. Chemntz, Germany, Apr 21 24, (1998). 2. Lews,..: Naïve Bayes at forty: The ndependence assumpton n nformaton retreva. In Proceedngs of the European Conference on Machne Learnng. Chemntz, Germany, Apr (1998). 3. Masand, B., Lnoff, G., Watz,.: Cassfyng news stores usng memory based reasonng. In Proceedngs of the 15th Internatona ACM/SIGIR Conference on Research & eveopment n Informaton Retreva. Copenhagen, enmark, June 21-24, (1992). 4. Ng, T. H., Goh, W. B., Low, K. L.: Feature seecton, percepton earnng and a usabty case study for text categorzaton. In Proceedngs of the 20th Annua Internatona ACM SIGIR Conference on Research and eveopment n Informaton Retreva. Phadepha, PA, USA, Juy 27-31, Yang, Y. & Lu, X.: An re-examnaton of text categorzaton. In Proceedngs of the 22nd Annua Internatona ACM SIGIR Conference on Research and eveopment n Informaton Retreva. Berkeey, CA, USA, August 15-19, Joachms, T.: Transductve nference for text cassfcaton usng support vector machnes. In Proceedngs of the 16th nternatona conference on machne earnng (ICML1999). Bed, Sovena, June 27-30, (1999). 7. Bum, A., Mtche, T.: Combnng abeed and unabeed data wth Co-Tranng. In Proceedngs of the 11th Annua Conference on Computatona Learnng Theory. Madson, Wsconsn, Juy 24-26, (1998). 8. Ngam, K., McCaurn, A. K., Thrun, S., Mtche, T.: Text cassfcaton from abeed and unabeed documents usng EM. Machne Learnng, 39(2/3): , Seeger, M.: Learnng wth abeed and unabeed data. Technca report, Ednburgh Unversty, Sonm, N., Tshby, N.: ocument Custerng usng Word Custers va the Informaton Botteneck Method. In Proceedngs of the 23rd Annua Internatona ACM SIGIR Conference on Research and eveopment n Informaton Retreva. Athens, Greece, Juy 24-28, (2000). 11. Zeng, H. J., Wang, X. H., Chen, Z., Ma, W. Y.: CBC: Custerng based text cassfcaton requrng mnma abeed data. In Proceedngs of the 3rd IEEE Internatona Conference on ata Mnng. Mebourne, Forda, USA, November 19 22, Yu, H., Yang, J., Han, J.: Cassfyng arge data sets usng SVMs wth herarchca custers. n Proceedngs of the 9th ACM SIGK 2003, Washngton, C, USA, Evans, R., Pfahrnger, B., Homes, G.: Custerng and Cassfcaton. 7 th Internatona conference on nformaton technoogy n Asa (CITA 11). Sarawak, Maaysa, Juy 12-13, 1-8 (2011) ComSIS Vo. 9, No. 4, Speca Issue, ecember 2012

17 Custerng based Two-Stage Text Cassfcaton Requrng Mnma Tranng ata 14. Kyrakopouou, A.: (2008). Text cassfcaton aded by custerng: a terature revew. Toos n Artfca Integence, , Fung, G., Mangasaran, O.L.: (2001). Sem-supervsed support vector machnes for unabeed data cassfcaton. Optm. Methods Software, 2001, v A. Kyrakopouou, T. Kaambouks. (2008). Combnng custerng wth cassfcaton for spam detecton n soca bookmarkng systems. n Proceedngs of ECML/PK scovery Chaenge 2008 (RSC 2008), Antwerp, Begum, 2008, pp Kyrakopouou, A.: Usng Custerng and Co-Tranng to Boost Cassfcaton Performance. In Proceedngs of the 19th IEEE Internatona Conference on Toos wth Artfca Integence. Patras, Greece, October 29-31, (2007). 18. Raskutt, B., Ferrá, H., Kowaczyk, A.: (2002). Combnng custerng and co-tranng to enhance text cassfcaton usng unabeed data. In Proceedngs of the 8th ACM SIGK nternatona conference on Knowedge dscovery and data mnng. Edmonton, Aberta, Canada, Juy 23-26, Ca, W., Chen, S., Zhang,.: A mutobjectve smutaneous earnng framework for custerng and cassfcaton. IEEE Transactons on Neura Networks, 21(2): , Qan, Q., Chen, S., Ca, W.: Smutaneous custerng and cassfcaton over custer structure representaton. Pattern Recognton, 2011, October Chapee, O., Weston, J., Schokopf, B.: Custer kernes for sem-supervsed earnng. Advances n Neura Informaton Processng Systems In NIPS 2002, Vo. 15 (2003), Zhou,., Bousquet, O., La, T. N., Weston, J., Schokopf, B.: Learnng wth oca and goba consstency. Advances n Neura Informaton Processng Systems 16, , Keswan, G., Ha, L.O.: Text cassfcaton wth enhanced sem-supervsed fuzzy custerng. Handbook of Fuzzy Computaton, 1994, Ng, A. Y., Jordan, M. I.: On dscrmnatve vs. generatve cassfers: A comparson of ogstc regresson and nave Bayes. Advances n Neura Informaton Processng Systems 14, Xue Zhang, receved the BS degree n eectronc engneerng from Xan Unversty, Xan, Chna, n She receved the MS degree n contro theory and contro engneerng from Southwest Unversty of Scence and Technoogy, Manyang, Chna, n 2003, and receved the Ph degree n computer scence from Southeast Unversty, Nanjng, Chna, n From 2008 to the present, she s a postdoctora feow n Pekng Unversty. Her research nterests ncude data mnng and machne earnng, wth emphass on the appcatons to text mnng and bonformatcs. Wangxn Xao, receved the Ph degree n traffc nformaton and contro engneerng from Southeast Unversty, Nanjng, Chna, n From 2005 to 2007, he engaged n postdoctora research n Wuhan Unversty of Technoogy. Snce 2008 he has been an assocate professor n Research Insttute of Hghway Mnstry of Transport. From 2009 to 2011, he was aso a postdoctora feow n Changsha Unversty of Scence and Technoogy. Hs research nterests ncude pattern recognton, Integent Transport Systems (ITS) and data mnng wth appcatons to traffc data. Receved: January 30, 2012; Accepted:ecember 05, ComSIS Vo. 9, No. 4, Speca Issue, ecember

18

Multi-agent System for Custom Relationship Management with SVMs Tool

Multi-agent System for Custom Relationship Management with SVMs Tool Mut-agent System for Custom Reatonshp Management wth SVMs oo Yanshan Xao, Bo Lu, 3, Dan Luo, and Longbng Cao Guangzhou Asan Games Organzng Commttee, Guangzhou 5063, P.R. Chna Facuty of Informaton echnoogy,

More information

An Ensemble Classification Framework to Evolving Data Streams

An Ensemble Classification Framework to Evolving Data Streams Internatona Journa of Scence and Research (IJSR) ISSN (Onne): 39-7064 An Ensembe Cassfcaton Framework to Evovng Data Streams Naga Chthra Dev. R MCA, (M.Ph), Sr Jayendra Saraswathy Maha Vdyaaya, Coege of

More information

An Efficient Job Scheduling for MapReduce Clusters

An Efficient Job Scheduling for MapReduce Clusters Internatona Journa of Future Generaton ommuncaton and Networkng, pp. 391-398 http://dx.do.org/10.14257/jfgcn.2015.8.2.32 An Effcent Job Schedung for MapReduce usters Jun Lu 1, Tanshu Wu 1, and Mng We Ln

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

Approximation Algorithms for Data Distribution with Load Balancing of Web Servers

Approximation Algorithms for Data Distribution with Load Balancing of Web Servers Approxmaton Agorthms for Data Dstrbuton wth Load Baancng of Web Servers L-Chuan Chen Networkng and Communcatons Department The MITRE Corporaton McLean, VA 22102 chen@mtreorg Hyeong-Ah Cho Department of

More information

Prediction of Success or Fail of Students on Different Educational Majors at the End of the High School with Artificial Neural Networks Methods

Prediction of Success or Fail of Students on Different Educational Majors at the End of the High School with Artificial Neural Networks Methods Predcton of Success or Fa of on Dfferent Educatona Maors at the End of the Hgh Schoo th Artfca Neura Netors Methods Sayyed Mad Maznan, Member, IACSIT, and Sayyede Azam Aboghasempur Abstract The man obectve

More information

Predicting Advertiser Bidding Behaviors in Sponsored Search by Rationality Modeling

Predicting Advertiser Bidding Behaviors in Sponsored Search by Rationality Modeling Predctng Advertser Bddng Behavors n Sponsored Search by Ratonaty Modeng Hafeng Xu Centre for Computatona Mathematcs n Industry and Commerce Unversty of Wateroo Wateroo, ON, Canada hafeng.ustc@gma.com Dy

More information

SUPPORT VECTOR MACHINE FOR REGRESSION AND APPLICATIONS TO FINANCIAL FORECASTING

SUPPORT VECTOR MACHINE FOR REGRESSION AND APPLICATIONS TO FINANCIAL FORECASTING SUPPORT VECTOR MACHINE FOR REGRESSION AND APPICATIONS TO FINANCIA FORECASTING Theodore B. Trafas and Husen Ince Schoo of Industra Engneerng Unverst of Okahoma W. Bod Sute 4 Norman Okahoma 739 trafas@ecn.ou.edu;

More information

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis

Adaptive Multi-Compositionality for Recursive Neural Models with Applications to Sentiment Analysis Proceedngs of the Twenty-Eghth AAAI Conference on Artfca Integence Adapte Mut-Compostonaty for Recurse Neura Modes wth Appcatons to Sentment Anayss L Dong Furu We Mng Zhou Ke Xu State Key Lab of Software

More information

Cardiovascular Event Risk Assessment Fusion of Individual Risk Assessment Tools Applied to the Portuguese Population

Cardiovascular Event Risk Assessment Fusion of Individual Risk Assessment Tools Applied to the Portuguese Population Cardovascuar Event Rsk Assessment Fuson of Indvdua Rsk Assessment Toos Apped to the Portuguese Popuaton S. Paredes, T. Rocha, P. de Carvaho, J. Henrques, J. Moras*, J. Ferrera, M. Mendes Abstract Cardovascuar

More information

Predictive Control of a Smart Grid: A Distributed Optimization Algorithm with Centralized Performance Properties*

Predictive Control of a Smart Grid: A Distributed Optimization Algorithm with Centralized Performance Properties* Predctve Contro of a Smart Grd: A Dstrbuted Optmzaton Agorthm wth Centrazed Performance Propertes* Phpp Braun, Lars Grüne, Chrstopher M. Keett 2, Steven R. Weer 2, and Kar Worthmann 3 Abstract The authors

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

SIMPLIFYING NDA PROGRAMMING WITH PROt SQL

SIMPLIFYING NDA PROGRAMMING WITH PROt SQL SIMPLIFYING NDA PROGRAMMING WITH PROt SQL Aeen L. Yam, Besseaar Assocates, Prnceton, NJ ABSRACf The programmng of New Drug Appcaton (NDA) Integrated Summary of Safety (ISS) usuay nvoves obtanng patent

More information

A Resources Allocation Model for Multi-Project Management

A Resources Allocation Model for Multi-Project Management A Resources Aocaton Mode for Mut-Proect Management Hamdatou Kane, Aban Tsser To cte ths verson: Hamdatou Kane, Aban Tsser. A Resources Aocaton Mode for Mut-Proect Management. 9th Internatona Conference

More information

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS

INVESTIGATION OF VEHICULAR USERS FAIRNESS IN CDMA-HDR NETWORKS 21 22 September 2007, BULGARIA 119 Proceedngs of the Internatonal Conference on Informaton Technologes (InfoTech-2007) 21 st 22 nd September 2007, Bulgara vol. 2 INVESTIGATION OF VEHICULAR USERS FAIRNESS

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

Hacia un Modelo de Red Inmunológica Artificial Basado en Kernels. Towards a Kernel Based Model for Artificial Immune Networks

Hacia un Modelo de Red Inmunológica Artificial Basado en Kernels. Towards a Kernel Based Model for Artificial Immune Networks Haca un Modeo de Red Inmunoógca Artfca Basado en Kernes Towards a Kerne Based Mode for Artfca Immune Networs Juan C. Gaeano, Ing. 1, Fabo A. Gonzáez, PhD. 1 Integent Systems Research Lab, Natona Unversty

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

Searching for Interacting Features for Spam Filtering

Searching for Interacting Features for Spam Filtering Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages

Assessing Student Learning Through Keyword Density Analysis of Online Class Messages Assessng Student Learnng Through Keyword Densty Analyss of Onlne Class Messages Xn Chen New Jersey Insttute of Technology xc7@njt.edu Brook Wu New Jersey Insttute of Technology wu@njt.edu ABSTRACT Ths

More information

Neural Network-based Colonoscopic Diagnosis Using On-line Learning and Differential Evolution

Neural Network-based Colonoscopic Diagnosis Using On-line Learning and Differential Evolution Neura Networ-based Coonoscopc Dagnoss Usng On-ne Learnng and Dfferenta Evouton George D. Magouas, Vasss P. Paganaos * and Mchae N. Vrahats * Department of Informaton Systems and Computng, Brune Unversty,

More information

Web Spam Detection Using Machine Learning in Specific Domain Features

Web Spam Detection Using Machine Learning in Specific Domain Features Journal of Informaton Assurance and Securty 3 (2008) 220-229 Web Spam Detecton Usng Machne Learnng n Specfc Doman Features Hassan Najadat 1, Ismal Hmed 2 Department of Computer Informaton Systems Faculty

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

TCP/IP Interaction Based on Congestion Price: Stability and Optimality

TCP/IP Interaction Based on Congestion Price: Stability and Optimality TCP/IP Interacton Based on Congeston Prce: Stabty and Optmaty Jayue He Eectrca Engneerng Prnceton Unversty Ema: jhe@prncetonedu Mung Chang Eectrca Engneerng Prnceton Unversty Ema: changm@prncetonedu Jennfer

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

Swing-Free Transporting of Two-Dimensional Overhead Crane Using Sliding Mode Fuzzy Control

Swing-Free Transporting of Two-Dimensional Overhead Crane Using Sliding Mode Fuzzy Control Swng-Free Transportng of Two-Dmensona Overhead Crane Usng Sdng Mode Fuzzy Contro Dantong Lu, Janqang, Dongn Zhao, and We Wang Astract An adaptve sdng mode fuzzy contro approach s proposed for a two-dmensona

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

Agglomeration economies in manufacturing industries: the case of Spain

Agglomeration economies in manufacturing industries: the case of Spain Aggomeraton economes n manufacturng ndustres: the case of Span Oga Aonso-Var José-María Chamorro-Rvas Xua Gonzáez-Cerdera Unversdade de Vgo October 001 Abstract: Ths paper anayses the extent of geographca

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Asymptotically Optimal Inventory Control for Assemble-to-Order Systems with Identical Lead Times

Asymptotically Optimal Inventory Control for Assemble-to-Order Systems with Identical Lead Times Asymptotcay Optma Inventory Contro for Assembe-to-Order Systems wth Identca ead Tmes Martn I. Reman Acate-ucent Be abs, Murray H, NJ 07974, marty@research.be-abs.com Qong Wang Industra and Enterprse Systems

More information

A Simple Congestion-Aware Algorithm for Load Balancing in Datacenter Networks

A Simple Congestion-Aware Algorithm for Load Balancing in Datacenter Networks A Smpe Congeston-Aware Agorthm for Load Baancng n Datacenter Networs Mehrnoosh Shafee, and Javad Ghader, Coumba Unversty Abstract We study the probem of oad baancng n datacenter networs, namey, assgnng

More information

How To Classfy Onlne Mesh Network Traffc Classfcaton And Onlna Wreless Mesh Network Traffic Onlnge Network

How To Classfy Onlne Mesh Network Traffc Classfcaton And Onlna Wreless Mesh Network Traffic Onlnge Network Journal of Computatonal Informaton Systems 7:5 (2011) 1524-1532 Avalable at http://www.jofcs.com Onlne Wreless Mesh Network Traffc Classfcaton usng Machne Learnng Chengje GU 1,, Shuny ZHANG 1, Xaozhen

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Research on Single and Mixed Fleet Strategy for Open Vehicle Routing Problem

Research on Single and Mixed Fleet Strategy for Open Vehicle Routing Problem 276 JOURNAL OF SOFTWARE, VOL 6, NO, OCTOBER 2 Research on Snge and Mxed Feet Strategy for Open Vehce Routng Probe Chunyu Ren Heongjang Unversty /Schoo of Inforaton scence and technoogy, Harbn, Chna Ea:

More information

DEFINING %COMPLETE IN MICROSOFT PROJECT

DEFINING %COMPLETE IN MICROSOFT PROJECT CelersSystems DEFINING %COMPLETE IN MICROSOFT PROJECT PREPARED BY James E Aksel, PMP, PMI-SP, MVP For Addtonal Informaton about Earned Value Management Systems and reportng, please contact: CelersSystems,

More information

Comparison of workflow software products

Comparison of workflow software products Internatona Conference on Computer Systems and Technooges - CompSysTech 2006 Comparson of worfow software products Krasmra Stoova,Todor Stoov Abstract: Ths research addresses probems, reated to the assessment

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

Using Content-Based Filtering for Recommendation 1

Using Content-Based Filtering for Recommendation 1 Usng Content-Based Flterng for Recommendaton 1 Robn van Meteren 1 and Maarten van Someren 2 1 NetlnQ Group, Gerard Brandtstraat 26-28, 1054 JK, Amsterdam, The Netherlands, robn@netlnq.nl 2 Unversty of

More information

Dynamic Virtual Network Allocation for OpenFlow Based Cloud Resident Data Center

Dynamic Virtual Network Allocation for OpenFlow Based Cloud Resident Data Center 56 IEICE TRANS. COMMUN., VOL.E96 B, NO. JANUARY 203 PAPER Speca Secton on Networ Vrtuazaton, and Fuson Patform of Computng and Networng Dynamc Vrtua Networ Aocaton for OpenFow Based Coud Resdent Data Center

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,

More information

XAC08-6 Professional Project Management

XAC08-6 Professional Project Management 1 XAC08-6 Professona Project anagement Ths Lecture: Tte s so manager shoud ncude a s management pan a document that gudes any experts agree Some faure project to Ba, ba, ba, ba Communcaton anagement Wee

More information

Multi-sensor Data Fusion for Cyber Security Situation Awareness

Multi-sensor Data Fusion for Cyber Security Situation Awareness Avalable onlne at www.scencedrect.com Proceda Envronmental Scences 0 (20 ) 029 034 20 3rd Internatonal Conference on Envronmental 3rd Internatonal Conference on Envronmental Scence and Informaton Applcaton

More information

Product Quality and Safety Incident Information Tracking Based on Web

Product Quality and Safety Incident Information Tracking Based on Web Product Qualty and Safety Incdent Informaton Trackng Based on Web News 1 Yuexang Yang, 2 Correspondng Author Yyang Wang, 2 Shan Yu, 2 Jng Q, 1 Hual Ca 1 Chna Natonal Insttute of Standardzaton, Beng 100088,

More information

Increasing Supported VoIP Flows in WMNs through Link-Based Aggregation

Increasing Supported VoIP Flows in WMNs through Link-Based Aggregation Increasng Supported VoIP Fows n WMNs through n-based Aggregaton J. Oech, Y. Hamam, A. Kuren F SATIE TUT Pretora, South Afrca oechr@gma.com T. Owa Meraa Insttute Counc of Scentfc and Industra Research (CSIR)

More information

Logistic Regression. Steve Kroon

Logistic Regression. Steve Kroon Logstc Regresson Steve Kroon Course notes sectons: 24.3-24.4 Dsclamer: these notes do not explctly ndcate whether values are vectors or scalars, but expects the reader to dscern ths from the context. Scenaro

More information

Master s Thesis. Configuring robust virtual wireless sensor networks for Internet of Things inspired by brain functional networks

Master s Thesis. Configuring robust virtual wireless sensor networks for Internet of Things inspired by brain functional networks Master s Thess Ttle Confgurng robust vrtual wreless sensor networks for Internet of Thngs nspred by bran functonal networks Supervsor Professor Masayuk Murata Author Shnya Toyonaga February 10th, 2014

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

Fault tolerance in cloud technologies presented as a service

Fault tolerance in cloud technologies presented as a service Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance

More information

Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style

Factored Conditional Restricted Boltzmann Machines for Modeling Motion Style Factored Condtona Restrcted Botzmann Machnes for Modeng Moton Stye Graham W. Tayor GWTAYLOR@CS.TORONTO.EDU Geoffrey E. Hnton HINTON@CS.TORONTO.EDU Department of Computer Scence, Unversty of Toronto, Toronto,

More information

Support Vector Machines

Support Vector Machines Support Vector Machnes Max Wellng Department of Computer Scence Unversty of Toronto 10 Kng s College Road Toronto, M5S 3G5 Canada wellng@cs.toronto.edu Abstract Ths s a note to explan support vector machnes.

More information

Semantic Link Analysis for Finding Answer Experts *

Semantic Link Analysis for Finding Answer Experts * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 51-65 (2012) Semantc Lnk Analyss for Fndng Answer Experts * YAO LU 1,2,3, XIAOJUN QUAN 2, JINGSHENG LEI 4, XINGLIANG NI 1,2,3, WENYIN LIU 2,3 AND YINLONG

More information

An Adaptive and Distributed Clustering Scheme for Wireless Sensor Networks

An Adaptive and Distributed Clustering Scheme for Wireless Sensor Networks 2007 Internatonal Conference on Convergence Informaton Technology An Adaptve and Dstrbuted Clusterng Scheme for Wreless Sensor Networs Xnguo Wang, Xnmng Zhang, Guolang Chen, Shuang Tan Department of Computer

More information

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Note APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES * Iranan Journal of Scence & Technology, Transacton B, Engneerng, ol. 30, No. B6, 789-794 rnted n The Islamc Republc of Iran, 006 Shraz Unversty "Research Note" ALICATION OF CHARGE SIMULATION METHOD TO ELECTRIC

More information

7.5. Present Value of an Annuity. Investigate

7.5. Present Value of an Annuity. Investigate 7.5 Present Value of an Annuty Owen and Anna are approachng retrement and are puttng ther fnances n order. They have worked hard and nvested ther earnngs so that they now have a large amount of money on

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

Product Approximate Reasoning of Online Reviews Applying to Consumer Affective and Psychological Motives Research

Product Approximate Reasoning of Online Reviews Applying to Consumer Affective and Psychological Motives Research Appled Mathematcs & Informaton Scences An Internatonal Journal 2011 NSP 5 (2) (2011), 45S-51S Product Approxmate Reasonng of Onlne Revews Applyng to Consumer Affectve and Psychologcal Motves Research Narsa

More information

Simple Interest Loans (Section 5.1) :

Simple Interest Loans (Section 5.1) : Chapter 5 Fnance The frst part of ths revew wll explan the dfferent nterest and nvestment equatons you learned n secton 5.1 through 5.4 of your textbook and go through several examples. The second part

More information

Mining Multiple Large Data Sources

Mining Multiple Large Data Sources The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of

More information

2) A single-language trained classifier: one. classifier trained on documents written in

2) A single-language trained classifier: one. classifier trained on documents written in Openng the ega terature Porta to mutngua access E. Francescon, G. Perugne ITTIG Insttute of Lega Informaton Theory and Technooges Itaan Natona Research Counc, Forence, Itay Te: +39 055 43999 Fax: +39 055

More information

The Dynamics of Wealth and Income Distribution in a Neoclassical Growth Model * Stephen J. Turnovsky. University of Washington, Seattle

The Dynamics of Wealth and Income Distribution in a Neoclassical Growth Model * Stephen J. Turnovsky. University of Washington, Seattle The Dynamcs of Weath and Income Dstrbuton n a Neocassca Growth Mode * Stephen J. Turnovsy Unversty of Washngton, Seatte Ceca García-Peñaosa CNRS and GREQAM March 26 Abstract: We examne the evouton of the

More information

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT

APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT APPLICATION OF PROBE DATA COLLECTED VIA INFRARED BEACONS TO TRAFFIC MANEGEMENT Toshhko Oda (1), Kochro Iwaoka (2) (1), (2) Infrastructure Systems Busness Unt, Panasonc System Networks Co., Ltd. Saedo-cho

More information

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh

More information

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence

More information

Disagreement-Based Multi-System Tracking

Disagreement-Based Multi-System Tracking Dsagreement-Based Mult-System Trackng Quannan L 1, Xnggang Wang 2, We Wang 3, Yuan Jang 3, Zh-Hua Zhou 3, Zhuowen Tu 1 1 Lab of Neuro Imagng, Unversty of Calforna, Los Angeles 2 Huazhong Unversty of Scence

More information

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo.

RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL. Yaoqi FENG 1, Hanping QIU 1. China Academy of Space Technology (CAST) yaoqi.feng@yahoo. ICSV4 Carns Australa 9- July, 007 RESEARCH ON DUAL-SHAKER SINE VIBRATION CONTROL Yaoq FENG, Hanpng QIU Dynamc Test Laboratory, BISEE Chna Academy of Space Technology (CAST) yaoq.feng@yahoo.com Abstract

More information

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy 4.02 Quz Solutons Fall 2004 Multple-Choce Questons (30/00 ponts) Please, crcle the correct answer for each of the followng 0 multple-choce questons. For each queston, only one of the answers s correct.

More information

Design and Development of a Security Evaluation Platform Based on International Standards

Design and Development of a Security Evaluation Platform Based on International Standards Internatonal Journal of Informatcs Socety, VOL.5, NO.2 (203) 7-80 7 Desgn and Development of a Securty Evaluaton Platform Based on Internatonal Standards Yuj Takahash and Yoshm Teshgawara Graduate School

More information

Gaining Insights to the Tea Industry of Sri Lanka using Data Mining

Gaining Insights to the Tea Industry of Sri Lanka using Data Mining Proceedngs of the Internatonal MultConference of Engneers and Computer Scentsts 2008 Vol I Ganng Insghts to the Tea Industry of Sr Lanka usng Data Mnng H.C. Fernando, W. M. R Tssera, and R. I. Athauda

More information

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure

More information

An Efficient Greedy Method for Unsupervised Feature Selection

An Efficient Greedy Method for Unsupervised Feature Selection hs artce has been accepted for pubcaton at the 11 IEEE 11th Internatona Conference on Data Mnng An Effcent Greedy Method for Unsupervsed Feature Seecton Ahmed K. Farahat A Ghods Mohamed S. Kame Unversty

More information

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange

More information

Performance Analysis and Coding Strategy of ECOC SVMs

Performance Analysis and Coding Strategy of ECOC SVMs Internatonal Journal of Grd and Dstrbuted Computng Vol.7, No. (04), pp.67-76 http://dx.do.org/0.457/jgdc.04.7..07 Performance Analyss and Codng Strategy of ECOC SVMs Zhgang Yan, and Yuanxuan Yang, School

More information

Branch-and-Price and Heuristic Column Generation for the Generalized Truck-and-Trailer Routing Problem

Branch-and-Price and Heuristic Column Generation for the Generalized Truck-and-Trailer Routing Problem REVISTA DE MÉTODOS CUANTITATIVOS PARA LA ECONOMÍA Y LA EMPRESA (12) Págnas 5 38 Dcembre de 2011 ISSN: 1886-516X DL: SE-2927-06 URL: http://wwwupoes/revmetcuant/artphp?d=51 Branch-and-Prce and Heurstc Coumn

More information

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by 6 CHAPTER 8 COMPLEX VECTOR SPACES 5. Fnd the kernel of the lnear transformaton gven n Exercse 5. In Exercses 55 and 56, fnd the mage of v, for the ndcated composton, where and are gven by the followng

More information

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan

More information

J. Parallel Distrib. Comput.

J. Parallel Distrib. Comput. J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

A machine vision approach for detecting and inspecting circular parts

A machine vision approach for detecting and inspecting circular parts A machne vson approach for detectng and nspectng crcular parts Du-Mng Tsa Machne Vson Lab. Department of Industral Engneerng and Management Yuan-Ze Unversty, Chung-L, Tawan, R.O.C. E-mal: edmtsa@saturn.yzu.edu.tw

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

Data Mining from the Information Systems: Performance Indicators at Masaryk University in Brno

Data Mining from the Information Systems: Performance Indicators at Masaryk University in Brno Data Mnng from the Informaton Systems: Performance Indcators at Masaryk Unversty n Brno Mkuláš Bek EUA Workshop Strasbourg, 1-2 December 2006 1 Locaton of Brno Brno EUA Workshop Strasbourg, 1-2 December

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

The OC Curve of Attribute Acceptance Plans

The OC Curve of Attribute Acceptance Plans The OC Curve of Attrbute Acceptance Plans The Operatng Characterstc (OC) curve descrbes the probablty of acceptng a lot as a functon of the lot s qualty. Fgure 1 shows a typcal OC Curve. 10 8 6 4 1 3 4

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting Causal, Explanatory Forecastng Assumes cause-and-effect relatonshp between system nputs and ts output Forecastng wth Regresson Analyss Rchard S. Barr Inputs System Cause + Effect Relatonshp The job of

More information

Rank Based Clustering For Document Retrieval From Biomedical Databases

Rank Based Clustering For Document Retrieval From Biomedical Databases Jayanth Mancassamy et al /Internatonal Journal on Computer Scence and Engneerng Vol.1(2), 2009, 111-115 Rank Based Clusterng For Document Retreval From Bomedcal Databases Jayanth Mancassamy Department

More information

When do data mining results violate privacy? Individual Privacy: Protect the record

When do data mining results violate privacy? Individual Privacy: Protect the record When do data mnng results volate prvacy? Chrs Clfton March 17, 2004 Ths s jont work wth Jashun Jn and Murat Kantarcıoğlu Indvdual Prvacy: Protect the record Indvdual tem n database must not be dsclosed

More information

Single and multiple stage classifiers implementing logistic discrimination

Single and multiple stage classifiers implementing logistic discrimination Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,

More information

Automated Network Performance Management and Monitoring via One-class Support Vector Machine

Automated Network Performance Management and Monitoring via One-class Support Vector Machine Automated Network Performance Management and Montorng va One-class Support Vector Machne R. Zhang, J. Jang, and S. Zhang Dgtal Meda & Systems Research Insttute, Unversty of Bradford, UK Abstract: In ths

More information

1 Example 1: Axis-aligned rectangles

1 Example 1: Axis-aligned rectangles COS 511: Theoretcal Machne Learnng Lecturer: Rob Schapre Lecture # 6 Scrbe: Aaron Schld February 21, 2013 Last class, we dscussed an analogue for Occam s Razor for nfnte hypothess spaces that, n conjuncton

More information

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers Journal of Computatonal Informaton Systems 7: 13 (2011) 4740-4747 Avalable at http://www.jofcs.com A Load-Balancng Algorthm for Cluster-based Mult-core Web Servers Guohua YOU, Yng ZHAO College of Informaton

More information

Performance Management and Evaluation Research to University Students

Performance Management and Evaluation Research to University Students 631 A publcaton of CHEMICAL ENGINEERING TRANSACTIONS VOL. 46, 2015 Guest Edtors: Peyu Ren, Yancang L, Hupng Song Copyrght 2015, AIDIC Servz S.r.l., ISBN 978-88-95608-37-2; ISSN 2283-9216 The Italan Assocaton

More information