High Performance Latent Dirichlet Allocation for Text Mining

Size: px
Start display at page:

Download "High Performance Latent Dirichlet Allocation for Text Mining"

Transcription

1 Hgh Performance Latent Drchlet Allocaton for Text Mnng A thess submtte for Degree of Doctor of Phlosophy By Department of Electronc an Computer Engneerng School of Engneerng an Desgn Brunel Unversty September 2013

2 Abstract Latent Drchlet Allocaton (LDA), a total probablty generatve moel, s a three-ter Bayesan moel. LDA computes the latent topc structure of the ata an obtans the sgnfcant nformaton of ocuments. However, tratonal LDA has several lmtatons n practcal applcatons. LDA cannot be rectly use n classfcaton because t s a non-supervse learnng moel. It nees to be embee nto approprate classfcaton algorthms. LDA s a generatve moel as t normally generates the latent topcs n the categores where the target ocuments o not belong to, proucng the evaton n computaton an reucng the classfcaton accuracy. The number of topcs n LDA nfluences the learnng process of moel parameters greatly. Nose samples n the tranng ata also affect the fnal text classfcaton result. An, the qualty of LDA base classfers epens on the qualty of the tranng samples to a great extent. Although parallel LDA algorthms are propose to eal wth huge amounts of ata, balancng computng loas n a computer cluster poses another challenge. Ths thess presents a text classfcaton metho whch combnes the LDA moel an Support Vector Machne (SVM) classfcaton algorthm for an mprove accuracy n classfcaton when reucng the menson of atasets. Base on Densty-Base Spatal Clusterng of Applcatons wth Nose (DBSCAN), the algorthm automatcally optmzes the number of topcs to be selecte whch reuces the number of teratons n computaton. Furthermore, ths thess presents a nose ata reucton scheme to process nose ata. When the nose rato s large n the tranng ata set, the nose reucton scheme can always prouce a hgh level of accuracy n classfcaton. Fnally, the thess parallelzes LDA usng the MapReuce moel whch s the e facto computng stanar n supportng ata ntensve applcatons. A genetc algorthm base loa balancng algorthm s esgne to balance the workloas among computers n a heterogeneous MapReuce cluster where the computers have a varety of computng resources n terms of CPU spee, memory space an har sk space.

3 Acknowlegement I woul thank many people for ther help. Frst of all, I woul lke to thank my PhD supervsor, Prof. Maozhen L. In the process of the whole research, he always gave me the most help an guance from en to en. In aton, I have been encourage greatly wth hs avce an support so that I coul face all the ffcultes. Not only I learn more about my research, but also I learne how to analyze an solve the problem. I also thank Xaoyu Chen, Yang Lu, Yu Zhao. They gave me lots of sgnfcant opnons. Especally ther support an care n the usual lfe. Moreover, I stll woul thank the School of Engneerng an Desgn, an Brunel Unversty. Durng my PhD research years, I acheve all the aspects of the help from the School an the Unversty. Fnally, I thank my parents, grlfren an housemates. They gave me the greatest support an courage when I was n the most ffcult tme. Here, I woul lke to express my heartfelt thanks to all the frens who have helpe me. I shall remember the ays I spent at Brunel forever.

4 Author s Declaraton The work escrbe n ths thess has not been prevously submtte for a egree n ths or any other unversty an unless otherwse reference t s the author s own work.

5 Statement of Copyrght The copyrght of ths thess rests wth the author. No quotaton from t shoul be publshe wthout hs pror wrtten consent an nformaton erve from t shoul be acknowlege. v

6 Lst of Abbrevatons AD-LDA API BP CTM DAG DBSCAN DLB DP EM FIFO GA GFS GS HDFS HD-LDA HDP ICE Intel TBB JDK KLD Approxmate Dstrbute LDA Applcaton Program Interface Belef Propagaton Correlate Topc Moel Drecte Acyclc Graph Densty-Base Spatal Clusterng of Applcatons wth Nose Dynamc Loa Balancng Drchlet Process Expectaton-Maxmzaton Frst n Frst out Genetc Algorthm Google Fle System Gbbs Samplng Haoop Dstrbute Fle System Herarchcal Dstrbute LDA Herarchcal Drchlet Process Internet Communcatons Engne Intel Threang Bulng Blocks Java Development Kt Kullback-Lebler Dvergence v

7 K-NN LDA LSI MCMC MPI PAM PLDA PLSI SGA SSH SVD SVM TF-IDF VB VI VSM K-Nearest Neghbor Latent Drchlet Allocaton Latent Semantc Inexng Markov Chan Monte Carlo Message Passng Interface Pachnko Allocaton Moel Parallel Latent Drchlet Allocaton Probablstc Latent Semantc Inexng Smple Genetc Algorthm Secure Shell Sngular Value Decomposton Support Vector Machnes Term Frequency-Invert Document Frequency Varatonal Bayes Varatonal Inference Vector Space Moule v

8 Table of Contents Abstract... Acknowlegement... Author s Declaraton... Statement of Copyrght... v Lst of Abbrevatons... v Table of Contents... v Lst of Fgures... x Lst of Tables... xv Chapter 1 Introucton Backgroun Text Mnng Technques Hgh Performance Computng for Text Mnng Motvaton of Work Major Contrbutons Structure of the Thess Chapter 2 Lterature Revew Introucton Probablty Topc Moels TF-IDF Moel Mxture of Ungrams LSI Moel Basc Concepts The Moelng Process The Avantages of LSI The Dsavantages of LSI PLSI Moel Basc Concepts v

9 The Moelng Process The Avantages of PLSI The Dsavantages of PLSI LDA Moel Basc Concepts The Moelng Process The Avantages of LDA An Overvew of the Man Inference Algorthms of LDA Moel Parameters Varatonal Inference (VI) Belef Propagaton (BP) Gbbs Samplng (GS) Analyss an Dscusson An Overvew of Genetc Algorthm The Basc Iea of Genetc Algorthm The Man Steps of Genetc Algorthm Cong Mechansm Ftness Functon Selecton Crossover Mutaton The Parameter Settngs of Genetc Algorthms An Overvew of Haoop HDFS MapReuce Programmng Moel n Haoop Haoop Scheulng Algorthms Dvsble Loa Theory Summary Chapter 3 Text Classfcaton wth Latent Drchlet Allocaton Introucton Overvew of Text Classfcaton v

10 3.2.1 The Content of Text Classfcaton Text Preprocessng Text Representaton Text Feature Extracton an Dmenson Reucton Text Classfcaton Algorthms Classfcaton Performance Evaluaton System Text Classfcaton base on LDA Gbbs Samplng Approxmate Inference Parameters of LDA Moel The Specfc Steps of Text Classfcaton Experment an Analyss Expermental Envronment Tranng Envronment for SVM The Data Set Evaluaton Methos Expermental Results Summary Chapter 4 Accuracy Enhancement wth Optmze Topcs an Nose Processng Introucton The Metho of Selectng the Optmal Number of Topcs Current Man Selecton Methos of the Optmal Number of Topcs Base on LDA Moel The Metho of Selectng the Optmal Number of Topcs Base on HDP The Stanar Metho of Bayesan Statstcs A Densty-base Clusterng Metho of Selectng the Optmal Number of Topcs n LDA DBSCAN Algorthm The Relatonshp between the Optmal Moel an the Topc Smlarty A Metho of Selectng the Optmal Number of Topcs Base on x

11 DBSCAN Experment an Result Analyss Nosy Data Reucton The Nose Problem n Text Classfcaton The Current Man LDA-base Methos of Nose Processng The Data Smoothng Base on the Generaton Process of Topc Moel The Category Entropy Base on LDA Moel A Nose Data Reucton Scheme The Experment an Result Analyss Summary Chapter 5 Genetc Algorthm base Statc Loa Balancng for Parallel Latent Drchlet Allocaton Introucton The Current Man Parallelzaton Methos of LDA Mahout s Parallelzaton of LDA Yahoo s Parallel Topc Moel The Algorthm of Parallel LDA Base on Belef Propagaton Google s PLDA Analyss an Dscusson Parallelng LDA wth MapReuce/Haoop The Workng Process of MapReuce on Haoop The Algorthm an Implementaton of PLDA A Statc Loa Balancng Strategy Base on Genetc Algorthm for PLDA n Haoop The Algorthm Desgn The Desgn an Implementaton of Genetc Algorthm Encong Scheme The Intalzaton of Populaton Ftness Functon x

12 Crossover Mutaton The optmal retenton strategy Experment an Analyss Evaluatng PLDA n Haoop The Expermental Envronment Expermental Data The Experment n the Homogeneous Envronment The Experment n the Heterogeneous Envronment The Experment of Loa Balancng n a Smulate Envronment Summary Chapter 6 Concluson an Future Works Concluson Future Works The Supervse LDA The Improve PLDA PLDA Dynamc Loa Balancng Problem The Applcaton of Clou Computng Platform The Applcaton of Interscplnary Fel References x

13 Lst of Fgures Fgure 2.1: The generatve process of the topc moel Fgure 2.2: (Left) The ungrams (Rght) The mxture of ungrams Fgure 2.3: The agram of sngular value ecomposton (SVD) Fgure 2.4: The graphcal moel representaton of PLSI Fgure 2.5: The network topology of LDA latent topcs Fgure 2.6: (Left) The structure agram of LDA latent topcs (Rght) The graphcal moel representaton of LDA Fgure 2.7: The LDA probablstc graphcal moel wth the varatonal parameters. 39 Fgure 2.8: Belef propagaton n the LDA moel base on the factor graph Fgure 2.9: The typcal MapReuce framework n Haoop Fgure 3.1: The typcal process of automatc text classfcaton Fgure 3.2: The separatng hyperplane of SVM algorthm Fgure 3.3: Comparson of the performance of three methos on each class Fgure 4.1: HDP moel Fgure 4.2: The relatonshp between logp(w T) an T Fgure 4.3: The ata smoothng base on LDA weakens the nfluence of nose samples Fgure 4.4: The flow agram of a nose ata reucton scheme Fgure 4.5: The relatonshp between the number of teratons an the effect of nose processng wth fferent nose ratos Fgure 4.6: Classfcaton results of fferent methos wth varous nose ratos n frst group of ata Fgure 4.7: Classfcaton results of fferent methos wth varous nose ratos n secon group of ata Fgure 4.8: Classfcaton results of fferent methos wth varous nose ratos n thr group of ata x

14 Fgure 5.1: The framework of one Gbbs samplng teraton n MapReuce-LDA Fgure 5.2: The computng tme wth fferent number of ata noes n a homogeneous cluster Fgure 5.3: A comparson of computng tme of ealng wth fferent szes of ata wth eght ata noes Fgure 5.4: The performance comparson of the loa balancng strategy wth the fferent heterogenety n a smulate envronment Fgure 5.5: The performance comparson of PLDA wth fferent szes of ata n a smulate envronment Fgure 5.6: The convergence of genetc algorthm n the loa balancng strategy x

15 Lst of Tables Table 3.1: The strbuton of fve classes of text n the 20newsgroup corpus Table 3.2: Comparson of three methos macro-average an mcro-average Table 3.3: Comparson of three methos mensonalty reucton egree to corpus Table 4.1: Results of the propose algorthm to fn the optmal value of topc Table 4.2: Expermental ata sets Table 5.1: The features comparson of mplementng PLDA wth MPI an MapReuce Table 5.2: The confguraton of the expermental envronment Table 5.3: The confguraton of noes n a Haoop cluster Table 5.4: The expermental result of sngle machne an one ata noe n the cluster processng the ata Table 5.5: The confguraton of the smulate Haoop envronment xv

16 Chapter 1 Chapter 1 Introucton 1.1 Backgroun So far, the Internet has accumulate a huge number of gtal nformaton nclung news, blogs, web pages, e-books, mages, auo, veo, socal networkng an other forms of ata, an the number of them has been growng at the spee of the exploson contnually [1]. Thus, how people can organze an manage large-scale ata effectvely an obtan the requre useful nformaton quckly has become a huge challenge. For example, the ata s too large to use the tratonal ata analyss tools an technques to eal wth them. Sometmes, even f the ata set s relatvely small, because of untratonal characterstcs of the ata, the tratonal methos also cannot be use [2] [3]. In aton, the expert system technology can put knowlege nto the knowlege base manually by specal users or oman experts. Unfortunately, ths process often has some evaton an mstake, an t s tme-consumng an hgh-cost [4] [5]. Therefore, t s necessary to evelop new technologes an automatc tools whch can convert massve ata nto useful nformaton an knowlege ntellgently. Data mnng s a technque, an t s able to combne tratonal ata analyss methos wth complex algorthms that can eal wth large amounts of ata. Data mnng s a complex process where the unknown an valuable moes or rules are extracte from mass ata. Furthermore, t s an nterscplne, whch s closely relate to atabase system, statstcs, machne learnng, nformaton scence an other scplnes [1] [2] [4]. So, ata mnng can be seen as the result of the natural evoluton of nformaton technology. Accorng to the processng object, t can be ve nto object ata mnng, spatal ata mnng, multmea ata mnng, Web mnng an text mnng [3] [6]. Hgh Performance Latent Drchlet Allocaton for Text Mnng 1

17 Chapter 1 Text s the most mportant representaton of the nformaton. The statstcs research showe that 80 percent of nformaton n an organzaton was store n the form of text, whch nclue books, research papers, news artcles, Web pages, e-mal an so on [7]. Text s able to express vast an abunant nformaton meanwhle t contans lots of unetecte potental knowlege. The whole text set s not structure ata an t lacks machne-unerstanable semantcs so that t s qute ffcult to eal wth a huge number of ocuments. So n the fel of ata mnng, a new technology whch can process the above text ata effectvely was propose, whch was calle text mnng [5] [6] [8] Text Mnng Technques Text mnng was frst propose by Ronen Felman et al n 1995, whch was escrbe lke The Process of extractng nterestng Patterns from very large text collectons for the purpose of scoverng knowlege [6]. Text mnng was also known as text ata mnng or knowlege scovery n texts, whch was a process where the unknown, potental, unerstanable an useful knowlege can be foun from mass text ata [2] [6] [9] [10]. Meanwhle, t was also a process of analyzng text ata, extractng text nformaton an fnng out text knowlege [9] [11] [12]. The man technques of text mnng contan text classfcaton, text clusterng, text summarzaton, correlaton analyss, nformaton extracton, strbuton analyss, tren precton an so on [13] [14] [15]. Here, text classfcaton an text clusterng are the mnng methos whose object s the text set. But, the processng object of text summarzaton an nformaton extracton s a sngle ocument. There are many classfcaton methos n text classfcaton, an the frequently-use methos nclue Natve Bayes (NB), K-Nearest Neghbor (K-NN), Support Vector Machnes (SVM), Vector Space Moule (VSM) an Lnear Least Square Ft (LLSF) [6][7][8][9][12][13][14][15]. Hgh Performance Latent Drchlet Allocaton for Text Mnng 2

18 Chapter 1 In the fel of text mnng, machne learnng experts researche an put forwar probablstc topc moel, an fast unsupervse machne learnng algorthms were use to fn out the text hen nformaton automatcally [16] [17] [18] [19]. At present, man mnng latent semantc knowlege moels are Latent Semantc Inexng (LSI) [20] [21] [22] [23] [24], Probablstc Latent Semantc Inexng (PLSI) [25] [26] [27] [28] [29] an Latent Drchlet Allocaton (LDA) [30] [31] [32]. Ther applcaton almost covers all areas of text mnng an nformaton processng, such as text summarzaton, nformaton retreval an text classfcaton, etc [33] [34] [35]. Especally, Ble et al put forwar the LDA moel n 2003, whch was wely use to solve text classfcaton, text annotatons an other ssues. Beses, t create a seres of text processng methos whch were base on probablstc topc moelng [17] [36], an t was expane to mages, auo, veo an other multmea ata processng fels [37] [38]. LDA s a probablstc topc moelng whch moels screte ata sets such as text set [30]. It treats ocuments as the probablty strbuton of topcs an smplfes the generatve process of the text, whch helps to hanle large-scale text sets effcently. In aton, LDA s a three-ter Bayesan moel, whch nclues wors, topcs an ocuments three-ter structure. It makes each ocument express as a topc mxture where each topc s a probablty strbuton of the fxe wor lst. In bref, the basc moelng process of LDA s to establsh a ocument-wor co-occurrence matrx frstly. An then, a text tranng set s moele. Next, nference methos are use to obtan moel parameters, such as ocument-topc matrx an topc-wor matrx. Fnally, the learne moel wll be use to prect the topc probablty strbuton of new ocuments so as to express text nformaton [17] [30] [32]. Compare wth other topc moels, LDA has some unque avantages: Frstly, LDA Hgh Performance Latent Drchlet Allocaton for Text Mnng 3

19 Chapter 1 topc moel s a completely probablstc generatve moel, whch can use mature probablty algorthms to tran moels rectly. An, t s easy to use the moel [39]. Seconly, the sze of the parameter space of LDA s fxe an has nothng to o wth the sze of a text set so that t s more sutable for large-scale text sets [40]. Thrly, LDA s a herarchcal moel, whch s more stable than non-herarchcal moels [39] Hgh Performance Computng for Text Mnng In the aspects of processng spee, storage space, fault tolerance an access spee, the tratonal techncal archtecture an sngle computer wth seral-base approach are more an more unsutable to hanle mass ata [41] [42] [43] [44]. Parallel computng s an effectve metho of mprovng the computaton spee an processng capacty of the computer system. Its basc ea s to ecompose a problem nto several parts, an each part s compute by an nepenent processor n parallel [45] [46] [47]. A Parallel computng system can be a supercomputer wth multple processors, an t also can be a cluster whch s mae up of several nepenent computers that are nterconnecte n some way [44] [48]. However, for parallel programmng moels n the tratonal hgh-performance computng, such as PThrea, MPI an OpenMP [49] [50], evelopers have to unerstan bottome confguratons an parallel mplementaton etals. In 2004, Jeffrey Dean an Sanjay Ghemawat who were two engneers of Google propose the programmng ea of MapReuce [51], an apple t to the parallel strbute computng of large-scale ata sets. Accorng to functonal programmng eas, the MapReuce framework can ve a computatonal process nto two phases Map an Reuce [52] [53] [54]. In the Map phase, the nput ata s transmtte to map functons n the form of key-value pars. After operatons, an ntermeate set of key-value pars s generate. In the Reuce phase, all the ntermeate values sets wth the same keys wll be merge. The fnal result wll be output n the form of key-value Hgh Performance Latent Drchlet Allocaton for Text Mnng 4

20 Chapter 1 pars. For users, they only nee to mplement two functons map an reuce, reucng the complexty of esgnng parallel programmng sgnfcantly [51] [52]. Then, Google CEO Erc Schmt frst propose the concept of clou computng n 2006 [43]. Each bg nternatonal Internet nformaton technology company launche a seres of proucts n successon, showng ther research results [55] [56]. For example, Google launche Google App Engne whch was base on the clou computng envronment evelopment platform. Amazon launche Amazon EC2 (Amazon Elastc Compute Clou) whch was a powerful clou computng platform an Amazon S3 (Amazon Smple Storage Servce) whch can prove clou storage servce. Mcrosoft launche Wnows Azure whch was a clou computng platform. Clou computng s the result of comprehensve evelopment of parallel computng, strbute computng an gr computng. In other wors, clou computng s the commercal mplementaton of the above computng scence concepts [43] [57]. Clou computng s a new computng moe, s also a new composte moe of computer resources, an represents an nnovatve busness moe [55]. The basc prncple of clou computng s to strbute the requre calculaton, transmsson an storage n local computers or remote servers nto a large number of strbute computers, whch means that each computer shares huge tasks of calculaton, transmsson an storage together. Users can gan corresponng servces through the network, mprovng the resource utlzaton rate effectvely an realzng resource acquston on eman [58] [59]. The above clou computng platforms are commercal proucts, whch cannot be freely avalable to the major researchers. Thus, the emergence of open source clou computng platforms s able to gve the major researchers opportuntes. The researchers can use these open source projects to consttute a cluster system whch are mae up of several machnes n a laboratory envronment, an smulate the clou computng envronment n busness systems [57] [58]. Haoop s one of the most Hgh Performance Latent Drchlet Allocaton for Text Mnng 5

21 Chapter 1 famous open source strbute computng frameworks, whch acheves man technologes of Google clou computng [60] [61] [62]. Bascally, ts core content contans Haoop Dstrbute Fle System (HDFS) [53] [60] [61] [63] an MapReuce programmng moel [64] [65] [66]. Haoop orgnate n the projects of Lucene an Nutch, an then t evelope nto a separate project of Apache founaton. It s manly use for processng mass ata [53] [60] [63]. Beses, t proves a MapReuce framework base on Java wth strong portablty, whch can make the strbute computng apply to large low-cost clusters. Meanwhle, for nvuals an companes, t has a hgh performance so as to lay the founaton for the research an applcaton of ther strbute computng [67] [68]. Yahoo was the frst company whch use, researche an evelope Haoop eeply. In Yahoo, there were more than ten thousan CPUs n a sngle cluster usng Haoop, an there were hunres of researchers n the use of Haoop [52] [54]. Nowaays, t has become the manstream technology of clou computng. For example, unverstes, research nsttutons an the Internet companes research an use t [67]. Along wth the ncreasng popularty of the clou computng concept, Haoop wll get more attenton an faster evelopment [65]. Accorng to the gtze nformaton report whch was publshe by Internatonal Data Corporaton (IDC) n 2011, the amount of global nformaton woul be ouble an reouble every two years [1]. So, there s no oubt that t wll gve the ata storage an computng power of servers much pressure, whch cannot use tratonal topc moel algorthms to process large-scale text sets n hgh-performance servers [44] [69]. In aton, people cannot only hope that the mprovement of computer harware technology s able to enhance the processng effcency, but they nee to use effcent parallel computng technologes to apply to mult-core computers an cluster systems wth hgh-performance. Through a combnaton of harware an software, the accelerate processng can be accomplshe n the process of complex calculatons, solvng the bottleneck of computaton tme an memory capacty. At the Hgh Performance Latent Drchlet Allocaton for Text Mnng 6

22 Chapter 1 same tme, parallel computng can save cost to a certan extent, an complete large-scale computng tasks wth a lower nvestment [51] [68] [70]. When ealng wth mass ata, the savantage of LDA s that the computaton complexty wll be hgher an processng tme wll be longer [17] [71] [72]. Thus, how LDA can moel large-scale text effectvely, meet the requrements of the computaton tme an memory capacty, an fnally fn out latent topc rules s another bg ssue. In orer to mprove the LDA moel n processng mass ata, the parallel LDA algorthm has become one of the research hotspots. To research MapReuce parallel algorthm an apply t to topc moels has much practcal sgnfcance [51] [52] [66]. Frstly, parallel algorthms can solve the problem of calculaton an storage, an they can process large-scale ata more effcently. Seconly, parallel topc moels are able to aress many practcal applcaton problems, such as text classfcaton, nformaton retreval, text summarzaton an so on [17] [73]. Fnally, parallel topc moels can analyze the massve Internet user behavor ata to obtan useful nformaton, whch s able to be use n socal networkng webstes, ntellgent search an other Internet proucts [18] [72]. 1.2 Motvaton of Work Automatc text classfcaton nvolves ata mnng, computatonal lngustcs, nformatcs, artfcal ntellgence an other scplnes, whch s the bass an core of text mnng. It s an mportant applcaton fel of natural language processng, an also a sgnfcant applcaton technology of processng large-scale nformaton [2] [4] [74]. In short, text classfcaton technology s to make a large number of ocuments ve nto one or a group of categores where each category represents fferent concept topcs. Automatc text classfcaton systems can help users organze an obtan nformaton well, playng a sgnfcant role n mprovng the spee an Hgh Performance Latent Drchlet Allocaton for Text Mnng 7

23 Chapter 1 precson of nformaton retreval [5] [8]. Therefore, t has a very mportant research value. In text classfcaton, the selecton an mplementaton of classfcaton methos are the core parts of classfcaton systems. How to choose an approprate classfcaton moel s an mportant ssue [75] [76]. In aton, text feature selecton s also another key technology n text classfcaton [77] [78] [79]. The man role of makng probablstc topc moel apply to text classfcaton s to acheve the mensonalty reucton of the text ata representaton space by fnng out latent topc nformaton from ocuments. In the early stage, the most typcal representatve was the LSI moel. The mensonalty reucton effect of ts feature space s sgnfcant, but ts classfcaton performance often ecreases [80] [81]. Furthermore, ts parameter space grows lnearly wth the tranng ata because of the hgh computaton complexty of SVD operaton n LSI. These are lmtatons of LSI when t s apple to practcal problems. PLSI can be seen as a probablstc mprove verson of LSI, whch can escrbe the probablstc structure of the latent semantc space [25] [26]. However, PLSI stll has the problem of the parameter space growng lnearly wth the tranng set [29] [40]. Compare wth LSI an PLSI, the LDA moel s a completely probablstc generatve moel so that t has a complete nternal structure an t s able to use useful probablstc algorthms to tran an make use of the moel. In aton, the sze of the parameter space of the LDA moel has nothng to o wth the number of ocuments. Thus, LDA s more sutable to construct text representaton moel n a large-scale corpus [30] [32]. An, the LDA moel has got successful applcatons n machne learnng, nformaton retreval an other many fels. In the research of text classfcaton, the LDA moel s effectve but ts performance s not remarkable. The reason s that LDA s a non-supervse learnng moel so that t cannot be rectly use n classfcaton. It nees to be embee nto approprate classfcaton algorthms [82]. Hgh Performance Latent Drchlet Allocaton for Text Mnng 8

24 Chapter 1 The LDA moel tself s a generatve moel, an t s usually ntegrate wth generatve classfcaton algorthms. In ths generatve ntegrate moe, the tranng of classfers oes not use the entre tranng corpus to gan a sngle LDA moel, but the sub-lda moels of corresponng categores are obtane by each category of the tranng corpus. In fact, each sub-lda moel escrbes latent topcs of corresponng categores of ocuments [32] [83]. Because of separate tranng, each category shares a group of topcs, an topcs between categores are solate. But, one of man problems of ths classfcaton metho s that the target ocument wll take place the force strbuton of latent topcs n categores whch o not contan the target ocument, resultng n that the calculaton of the generatve probablty prouces the evaton n these categores so as to reuce the classfcaton performance [84]. Therefore, how to choose a sutable classfcaton algorthm combne wth the LDA moel to construct an effectve classfer that has become a challenge. In the LDA moel, topcs obey Drchet strbuton whch assumes that the emergence of a topc has nothng to o wth the emergence of other topcs. But n the real ata, many topcs have assocatons between them. For example, when PC appears, the probablty of Computer appearng wll be qute hgh but t s unlkely for Hosptal to appear. Obvously, ths nepenent assumpton s nconsstent wth the real ata. So, when usng the LDA moel to moel the whole text set, the number of topcs n the LDA moel wll nfluence the performance of moelng greatly [82]. Therefore, how to etermne the optmal number of topcs n the LDA moel s another research hotspot of LDA. At present, to etermne the optmal number of topcs n the LDA moel has two man methos: the selecton metho base on Herarchcal Drchlet Process (HDP) [86] [87] an the stanar metho n Bayesan statstcs [82]. The former uses the nonparametrc feature of Drchlet Process (DP) to solve the selecton problem of the optmal number of topcs n LDA. But, t nees to establsh a HDP moel an a LDA moel respectvely for the same one ata set. Obvously, when processng practcal Hgh Performance Latent Drchlet Allocaton for Text Mnng 9

25 Chapter 1 text classfcaton problems, t wll spen a lot of tme n computng [86] [88]. The latter nees to specfy a seres of values of T manually, where T stans for the optmal number of topcs n LDA. After carryng out Gbbs samplng algorthm an the relate calculaton, the value of T n the LDA moel can be etermne fnally [82]. Thus, an algorthm nees to be esgne to fn out the optmal number of topcs automatcally wth low consumng tme an operaton. In aton, n text classfcaton, the qualty of classfers wll affect the fnal classfcaton result greatly. An, the qualty of classfers epens on the qualty of the tranng text set to a great extent. In general, f classes of the tranng text set are more accurate an ts content s more comprehensve, the qualty of the obtane classfer wll be hgher [89] [90]. However, n practcal applcatons, t s qute ffcult to obtan ths hgh qualty of tranng text sets, especally large-scale text sets. Usually, the tranng ata contans nose unavoably, an the nose samples wll have a sgnfcant mpact on the fnal classfcaton result [91]. Usually, a manstream approach of nose processng s to entfy an remove nose samples from ata sets [90] [91]. At present, there are two man nose processng methos base on LDA, the ata smoothng metho base on the LDA moel [92] an the nose entfcaton metho base on the category entropy [93] [94]. They can remove a majorty of nose samples from text sets effectvely to some extent, but they cannot elete all the nose completely [95]. Furthermore, some normal samples wll be wrongly remove as nose unavoably [96]. Therefore, these ssues also become a challenge n text classfcaton. In orer to meet the requrements of storage capacty an computaton spee, the sngle-core seral programmng s turnng to the mult-core parallel programmng technology graually [41] [42]. Parallel programmng applyng to clusters or servers can solve some lmtatons when topc moel learnng algorthms process mass ata [69]. So, parallel topc moels are able to eal wth large-scale ata effectvely. But, Hgh Performance Latent Drchlet Allocaton for Text Mnng 10

Cluster Analysis. Cluster Analysis

Cluster Analysis. Cluster Analysis Cluster Analyss Cluster Analyss What s Cluster Analyss? Types of Data n Cluster Analyss A Categorzaton of Maor Clusterng Methos Parttonng Methos Herarchcal Methos Densty-Base Methos Gr-Base Methos Moel-Base

More information

DEGREES OF EQUIVALENCE IN A KEY COMPARISON 1 Thang H. L., Nguyen D. D. Vietnam Metrology Institute, Address: 8 Hoang Quoc Viet, Hanoi, Vietnam

DEGREES OF EQUIVALENCE IN A KEY COMPARISON 1 Thang H. L., Nguyen D. D. Vietnam Metrology Institute, Address: 8 Hoang Quoc Viet, Hanoi, Vietnam DEGREES OF EQUIVALECE I A EY COMPARISO Thang H. L., guyen D. D. Vetnam Metrology Insttute, Aress: 8 Hoang Quoc Vet, Hano, Vetnam Abstract: In an nterlaboratory key comparson, a ata analyss proceure for

More information

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.

More information

Improved SVM in Cloud Computing Information Mining

Improved SVM in Cloud Computing Information Mining Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu

More information

Forecasting the Direction and Strength of Stock Market Movement

Forecasting the Direction and Strength of Stock Market Movement Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems

More information

The Design of Efficiently-Encodable Rate-Compatible LDPC Codes

The Design of Efficiently-Encodable Rate-Compatible LDPC Codes The Desgn of Effcently-Encoable Rate-Compatble LDPC Coes Jaehong Km, Atya Ramamoorthy, Member, IEEE, an Steven W. McLaughln, Fellow, IEEE Abstract We present a new class of rregular low-ensty party-check

More information

On the Optimal Marginal Rate of Income Tax

On the Optimal Marginal Rate of Income Tax On the Optmal Margnal Rate of Income Tax Gareth D Myles Insttute for Fscal Stues an Unversty of Exeter June 999 Abstract: The paper shows that n the quas-lnear moel of ncome taxaton, the optmal margnal

More information

Calculating the high frequency transmission line parameters of power cables

Calculating the high frequency transmission line parameters of power cables < ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,

More information

Exact GP Schema Theory for Headless Chicken Crossover and Subtree Mutation

Exact GP Schema Theory for Headless Chicken Crossover and Subtree Mutation Exact GP Schema Theory for Healess Chcken Crossover an Subtree Mutaton Rccaro Pol School of Computer Scence The Unversty of Brmngham Brmngham, B5 TT, UK R.Pol@cs.bham.ac.uk Ncholas F. McPhee Dvson of Scence

More information

EXAMPLE PROBLEMS SOLVED USING THE SHARP EL-733A CALCULATOR

EXAMPLE PROBLEMS SOLVED USING THE SHARP EL-733A CALCULATOR EXAMPLE PROBLEMS SOLVED USING THE SHARP EL-733A CALCULATOR 8S CHAPTER 8 EXAMPLES EXAMPLE 8.4A THE INVESTMENT NEEDED TO REACH A PARTICULAR FUTURE VALUE What amount must you nvest now at 4% compoune monthly

More information

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,

More information

A Binary Quantum-behaved Particle Swarm Optimization Algorithm with Cooperative Approach

A Binary Quantum-behaved Particle Swarm Optimization Algorithm with Cooperative Approach IJCSI Internatonal Journal of Computer Scence Issues, Vol., Issue, No, January 3 ISSN (Prnt): 694-784 ISSN (Onlne): 694-84 www.ijcsi.org A Bnary Quantum-behave Partcle Swarm Optmzaton Algorthm wth Cooperatve

More information

A Programming Model for the Cloud Platform

A Programming Model for the Cloud Platform Internatonal Journal of Advanced Scence and Technology A Programmng Model for the Cloud Platform Xaodong Lu School of Computer Engneerng and Scence Shangha Unversty, Shangha 200072, Chna luxaodongxht@qq.com

More information

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..

More information

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network 700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School

More information

What is Candidate Sampling

What is Candidate Sampling What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble

More information

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features

On-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com

More information

A Practical Study of Regenerating Codes for Peer-to-Peer Backup Systems

A Practical Study of Regenerating Codes for Peer-to-Peer Backup Systems A Practcal Stuy of Regeneratng Coes for Peer-to-Peer Backup Systems Alessanro Dumnuco an Ernst Bersack EURECOM Sopha Antpols, France {umnuco,bersack}@eurecom.fr Abstract In strbute storage systems, erasure

More information

APPLICATION OF BINARY DIVISION ALGORITHM FOR IMAGE ANALYSIS AND CHANGE DETECTION TO IDENTIFY THE HOTSPOTS IN MODIS IMAGES

APPLICATION OF BINARY DIVISION ALGORITHM FOR IMAGE ANALYSIS AND CHANGE DETECTION TO IDENTIFY THE HOTSPOTS IN MODIS IMAGES APPLICATION OF BINARY DIVISION ALGORITHM FOR IMAGE ANALYSIS AND CHANGE DETECTION TO IDENTIFY THE HOTSPOTS IN MODIS IMAGES Harsh Kumar G R * an Dharmenra Sngh (hargrec@tr.ernet.n, harmfec@tr.ernet.n) Department

More information

Title Language Model for Information Retrieval

Title Language Model for Information Retrieval Ttle Language Model for Informaton Retreval Rong Jn Language Technologes Insttute School of Computer Scence Carnege Mellon Unversty Alex G. Hauptmann Computer Scence Department School of Computer Scence

More information

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers

A Load-Balancing Algorithm for Cluster-based Multi-core Web Servers Journal of Computatonal Informaton Systems 7: 13 (2011) 4740-4747 Avalable at http://www.jofcs.com A Load-Balancng Algorthm for Cluster-based Mult-core Web Servers Guohua YOU, Yng ZHAO College of Informaton

More information

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

On the Optimal Control of a Cascade of Hydro-Electric Power Stations On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;

More information

RSA Cryptography using Designed Processor and MicroBlaze Soft Processor in FPGAs

RSA Cryptography using Designed Processor and MicroBlaze Soft Processor in FPGAs RSA Cryptography usng Desgne Processor an McroBlaze Soft Processor n FPGAs M. Nazrul Islam Monal Dept. of CSE, Rajshah Unversty of Engneerng an Technology, Rajshah-6204, Banglaesh M. Al Mamun Dept. of

More information

An Efficient Recovery Algorithm for Coverage Hole in WSNs

An Efficient Recovery Algorithm for Coverage Hole in WSNs An Effcent Recover Algorthm for Coverage Hole n WSNs Song Ja 1,*, Wang Balng 1, Peng Xuan 1 School of Informaton an Electrcal Engneerng Harbn Insttute of Technolog at Weha, Shanong, Chna Automatc Test

More information

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending

Bayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success

More information

Multiple-Period Attribution: Residuals and Compounding

Multiple-Period Attribution: Residuals and Compounding Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens

More information

An Interest-Oriented Network Evolution Mechanism for Online Communities

An Interest-Oriented Network Evolution Mechanism for Online Communities An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne

More information

A RELIABLE SEMI-DISTRIBUTED LOAD BALANCING ARCHITECTURE OF HETEROGENEOUS WIRELESS NETWORKS

A RELIABLE SEMI-DISTRIBUTED LOAD BALANCING ARCHITECTURE OF HETEROGENEOUS WIRELESS NETWORKS Internatonal Journal of Computer Networks & Communcatons (IJCNC) Vol.4, No., January 0 A RELIABLE SEMI-DISTRIBUTED LOAD BALANCING ARCHITECTURE OF HETEROGENEOUS WIRELESS NETWORKS M. Golam Rabul Alam, Chayan

More information

A Secure Password-Authenticated Key Agreement Using Smart Cards

A Secure Password-Authenticated Key Agreement Using Smart Cards A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,

More information

320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:

More information

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching) Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton

More information

Rank Based Clustering For Document Retrieval From Biomedical Databases

Rank Based Clustering For Document Retrieval From Biomedical Databases Jayanth Mancassamy et al /Internatonal Journal on Computer Scence and Engneerng Vol.1(2), 2009, 111-115 Rank Based Clusterng For Document Retreval From Bomedcal Databases Jayanth Mancassamy Department

More information

THE LOAD PLANNING PROBLEM FOR LESS-THAN-TRUCKLOAD MOTOR CARRIERS AND A SOLUTION APPROACH. Professor Naoto Katayama* and Professor Shigeru Yurimoto*

THE LOAD PLANNING PROBLEM FOR LESS-THAN-TRUCKLOAD MOTOR CARRIERS AND A SOLUTION APPROACH. Professor Naoto Katayama* and Professor Shigeru Yurimoto* 7th Internatonal Symposum on Logstcs THE LOAD PLAIG PROBLEM FOR LESS-THA-TRUCKLOAD MOTOR CARRIERS AD A SOLUTIO APPROACH Professor aoto Katayama* an Professor Shgeru Yurmoto* * Faculty of Dstrbuton an Logstcs

More information

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1

Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1 Send Orders for Reprnts to reprnts@benthamscence.ae The Open Cybernetcs & Systemcs Journal, 2014, 8, 115-121 115 Open Access A Load Balancng Strategy wth Bandwdth Constrant n Cloud Computng Jng Deng 1,*,

More information

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System

Mining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System Mnng Feature Importance: Applyng Evolutonary Algorthms wthn a Web-based Educatonal System Behrouz MINAEI-BIDGOLI 1, and Gerd KORTEMEYER 2, and Wllam F. PUNCH 1 1 Genetc Algorthms Research and Applcatons

More information

A novel Method for Data Mining and Classification based on

A novel Method for Data Mining and Classification based on A novel Method for Data Mnng and Classfcaton based on Ensemble Learnng 1 1, Frst Author Nejang Normal Unversty;Schuan Nejang 641112,Chna, E-mal: lhan-gege@126.com Abstract Data mnng has been attached great

More information

On the computation of the capital multiplier in the Fortis Credit Economic Capital model

On the computation of the capital multiplier in the Fortis Credit Economic Capital model On the computaton of the captal multpler n the Forts Cret Economc Captal moel Jan Dhaene 1, Steven Vuffel 2, Marc Goovaerts 1, Ruben Oleslagers 3 Robert Koch 3 Abstract One of the key parameters n the

More information

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm

Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,

More information

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Can Auto Liability Insurance Purchases Signal Risk Attitude? Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang

More information

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary

More information

An Alternative Way to Measure Private Equity Performance

An Alternative Way to Measure Private Equity Performance An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate

More information

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure

More information

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State

More information

The Investment Decision-Making Index System and the Grey Comprehensive Evaluation Method under Hybrid Cloud

The Investment Decision-Making Index System and the Grey Comprehensive Evaluation Method under Hybrid Cloud The Investment Decson-Makn Inex System an the Grey Comprehensve Evaluaton Metho uner Hybr Clou Donln Chen 1, Mn Fu 1, Xueron Jan 2, an Dawe Son 1 1 School o Economcs, WHUT, Wuhan, Chna 2 School o Manaement,

More information

Exploiting Recommendation on Social Media Networks

Exploiting Recommendation on Social Media Networks Internatonal Journal of Scence and Research IJSR) ISSN Onln: 2319-7064 Index Coperncus Value 2013): 6.14 Impact Factor 2013): 4.438 Explotng Recommendaton on Socal Meda Networs Swat A. Adhav 1, Sheetal

More information

IMPACT ANALYSIS OF A CELLULAR PHONE

IMPACT ANALYSIS OF A CELLULAR PHONE 4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng

More information

Credit Limit Optimization (CLO) for Credit Cards

Credit Limit Optimization (CLO) for Credit Cards Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt

More information

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble

More information

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement

An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence

More information

Development of an intelligent system for tool wear monitoring applying neural networks

Development of an intelligent system for tool wear monitoring applying neural networks of Achevements n Materals and Manufacturng Engneerng VOLUME 14 ISSUE 1-2 January-February 2006 Development of an ntellgent system for tool wear montorng applyng neural networks A. Antć a, J. Hodolč a,

More information

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment

Survey on Virtual Machine Placement Techniques in Cloud Computing Environment Survey on Vrtual Machne Placement Technques n Cloud Computng Envronment Rajeev Kumar Gupta and R. K. Paterya Department of Computer Scence & Engneerng, MANIT, Bhopal, Inda ABSTRACT In tradtonal data center

More information

Trust Network and Trust Community Clustering based on Shortest Path Analysis for E-commerce

Trust Network and Trust Community Clustering based on Shortest Path Analysis for E-commerce Internatonal Journal of u- an e- Serce, Scence an Technology Trust Network an Trust Communty Clusterng base on Shortest Path Analyss for E-commerce Shaozhong Zhang 1, Jungan Chen 1, Haong Zhong 2, Zhaox

More information

A heuristic task deployment approach for load balancing

A heuristic task deployment approach for load balancing Xu Gaochao, Dong Yunmeng, Fu Xaodog, Dng Yan, Lu Peng, Zhao Ja Abstract A heurstc task deployment approach for load balancng Gaochao Xu, Yunmeng Dong, Xaodong Fu, Yan Dng, Peng Lu, Ja Zhao * College of

More information

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel

More information

Traffic State Estimation in the Traffic Management Center of Berlin

Traffic State Estimation in the Traffic Management Center of Berlin Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,

More information

Realistic Image Synthesis

Realistic Image Synthesis Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random

More information

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell

More information

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble

ECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble 1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In

More information

LITERATURE REVIEW: VARIOUS PRIORITY BASED TASK SCHEDULING ALGORITHMS IN CLOUD COMPUTING

LITERATURE REVIEW: VARIOUS PRIORITY BASED TASK SCHEDULING ALGORITHMS IN CLOUD COMPUTING LITERATURE REVIEW: VARIOUS PRIORITY BASED TASK SCHEDULING ALGORITHMS IN CLOUD COMPUTING 1 MS. POOJA.P.VASANI, 2 MR. NISHANT.S. SANGHANI 1 M.Tech. [Software Systems] Student, Patel College of Scence and

More information

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008 Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn

More information

Implementation of Deutsch's Algorithm Using Mathcad

Implementation of Deutsch's Algorithm Using Mathcad Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"

More information

Online Inference of Topics with Latent Dirichlet Allocation

Online Inference of Topics with Latent Dirichlet Allocation Onlne Inference of Topcs wth Latent Drchlet Allocaton Kevn R. Cann Computer Scence Dvson Unversty of Calforna Berkeley, CA 94720 kevn@cs.berkeley.edu Le Sh Helen Wlls Neuroscence Insttute Unversty of Calforna

More information

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management

More information

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta

More information

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure

More information

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,

More information

A DATA MINING APPLICATION IN A STUDENT DATABASE

A DATA MINING APPLICATION IN A STUDENT DATABASE JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul

More information

Product Quality and Safety Incident Information Tracking Based on Web

Product Quality and Safety Incident Information Tracking Based on Web Product Qualty and Safety Incdent Informaton Trackng Based on Web News 1 Yuexang Yang, 2 Correspondng Author Yyang Wang, 2 Shan Yu, 2 Jng Q, 1 Hual Ca 1 Chna Natonal Insttute of Standardzaton, Beng 100088,

More information

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and POLYSA: A Polynomal Algorthm for Non-bnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n

More information

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson

More information

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications

Methodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and

More information

Ants Can Schedule Software Projects

Ants Can Schedule Software Projects Ants Can Schedule Software Proects Broderck Crawford 1,2, Rcardo Soto 1,3, Frankln Johnson 4, and Erc Monfroy 5 1 Pontfca Unversdad Católca de Valparaíso, Chle FrstName.Name@ucv.cl 2 Unversdad Fns Terrae,

More information

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao

More information

Fault tolerance in cloud technologies presented as a service

Fault tolerance in cloud technologies presented as a service Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance

More information

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*

Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising* Probablstc Latent Semantc User Segmentaton for Behavoral Targeted Advertsng* Xaohu Wu 1,2, Jun Yan 2, Nng Lu 2, Shucheng Yan 3, Yng Chen 1, Zheng Chen 2 1 Department of Computer Scence Bejng Insttute of

More information

Incentive Compatible Mechanisms for Group Ticket Allocation in Software Maintenance Services

Incentive Compatible Mechanisms for Group Ticket Allocation in Software Maintenance Services 14th Asa-Pacfc Software Engneerng Conference Incentve Compatble Mechansms for Group Tcket Allocaton n Software Mantenance Servces Karthk Subban, Ramakrshnan Kannan IBM R Ina Software Lab, EGL D Block,

More information

Overview of monitoring and evaluation

Overview of monitoring and evaluation 540 Toolkt to Combat Traffckng n Persons Tool 10.1 Overvew of montorng and evaluaton Overvew Ths tool brefly descrbes both montorng and evaluaton, and the dstncton between the two. What s montorng? Montorng

More information

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model

More information

A Simple Approach to Clustering in Excel

A Simple Approach to Clustering in Excel A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa

More information

A General and Practical Datacenter Selection Framework for Cloud Services

A General and Practical Datacenter Selection Framework for Cloud Services 212 IEEE Ffth Internatonal Conference on Clou Computng A General an Practcal Datacenter Selecton Framework for Clou Servces Hong Xu, Baochun L henryxu, bl@eecg.toronto.eu Department of Electrcal an Computer

More information

Enterprise Master Patient Index

Enterprise Master Patient Index Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an

More information

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there

More information

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Activity Scheduling for Cost-Time Investment Optimization in Project Management PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng

More information

Searching for Interacting Features for Spam Filtering

Searching for Interacting Features for Spam Filtering Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software

More information

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop

IWFMS: An Internal Workflow Management System/Optimizer for Hadoop IWFMS: An Internal Workflow Management System/Optmzer for Hadoop Lan Lu, Yao Shen Department of Computer Scence and Engneerng Shangha JaoTong Unversty Shangha, Chna lustrve@gmal.com, yshen@cs.sjtu.edu.cn

More information

BERNSTEIN POLYNOMIALS

BERNSTEIN POLYNOMIALS On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful

More information

Study on Model of Risks Assessment of Standard Operation in Rural Power Network

Study on Model of Risks Assessment of Standard Operation in Rural Power Network Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,

More information

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS

METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng

More information

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION

THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh

More information

A Dynamic Load Balancing for Massive Multiplayer Online Game Server

A Dynamic Load Balancing for Massive Multiplayer Online Game Server A Dynamc Load Balancng for Massve Multplayer Onlne Game Server Jungyoul Lm, Jaeyong Chung, Jnryong Km and Kwanghyun Shm Dgtal Content Research Dvson Electroncs and Telecommuncatons Research Insttute Daejeon,

More information

Parallel Numerical Simulation of Visual Neurons for Analysis of Optical Illusion

Parallel Numerical Simulation of Visual Neurons for Analysis of Optical Illusion 212 Thrd Internatonal Conference on Networkng and Computng Parallel Numercal Smulaton of Vsual Neurons for Analyss of Optcal Illuson Akra Egashra, Shunj Satoh, Hdetsugu Ire and Tsutomu Yoshnaga Graduate

More information

iavenue iavenue i i i iavenue iavenue iavenue

iavenue iavenue i i i iavenue iavenue iavenue Saratoga Systems' enterprse-wde Avenue CRM system s a comprehensve web-enabled software soluton. Ths next generaton system enables you to effectvely manage and enhance your customer relatonshps n both

More information

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Luby s Alg. for Maximal Independent Sets using Pairwise Independence Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent

More information

Present Values and Accumulations

Present Values and Accumulations Present Values an Accumulatons ANGUS S. MACDONALD Volume 3, pp. 1331 1336 In Encyclopea Of Actuaral Scence (ISBN -47-84676-3) Ete by Jozef L. Teugels an Bjørn Sunt John Wley & Sons, Lt, Chchester, 24 Present

More information

Gender Classification for Real-Time Audience Analysis System

Gender Classification for Real-Time Audience Analysis System Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,

More information

Conversion between the vector and raster data structures using Fuzzy Geographical Entities

Conversion between the vector and raster data structures using Fuzzy Geographical Entities Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,

More information

Oxygen Saturation Measurement and Optimal Accuracy in Nair

Oxygen Saturation Measurement and Optimal Accuracy in Nair The Applcaton of Threshold De-nosng n Moble Oxygen Saturaton Montorng Software Tang Nng and Xu Zhenzhen School of Computer Scence and Technology, Donghua Unversty, Shangha, Chna 201620 tnwysyd@126.com

More information

Data Mining from the Information Systems: Performance Indicators at Masaryk University in Brno

Data Mining from the Information Systems: Performance Indicators at Masaryk University in Brno Data Mnng from the Informaton Systems: Performance Indcators at Masaryk Unversty n Brno Mkuláš Bek EUA Workshop Strasbourg, 1-2 December 2006 1 Locaton of Brno Brno EUA Workshop Strasbourg, 1-2 December

More information

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS

SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS Magdalena Rogalska 1, Wocech Bożeko 2,Zdzsław Heduck 3, 1 Lubln Unversty of Technology, 2- Lubln, Nadbystrzycka 4., Poland. E-mal:rogalska@akropols.pol.lubln.pl

More information

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence

More information