High Performance Latent Dirichlet Allocation for Text Mining
|
|
- Clifton Ward
- 8 years ago
- Views:
Transcription
1 Hgh Performance Latent Drchlet Allocaton for Text Mnng A thess submtte for Degree of Doctor of Phlosophy By Department of Electronc an Computer Engneerng School of Engneerng an Desgn Brunel Unversty September 2013
2 Abstract Latent Drchlet Allocaton (LDA), a total probablty generatve moel, s a three-ter Bayesan moel. LDA computes the latent topc structure of the ata an obtans the sgnfcant nformaton of ocuments. However, tratonal LDA has several lmtatons n practcal applcatons. LDA cannot be rectly use n classfcaton because t s a non-supervse learnng moel. It nees to be embee nto approprate classfcaton algorthms. LDA s a generatve moel as t normally generates the latent topcs n the categores where the target ocuments o not belong to, proucng the evaton n computaton an reucng the classfcaton accuracy. The number of topcs n LDA nfluences the learnng process of moel parameters greatly. Nose samples n the tranng ata also affect the fnal text classfcaton result. An, the qualty of LDA base classfers epens on the qualty of the tranng samples to a great extent. Although parallel LDA algorthms are propose to eal wth huge amounts of ata, balancng computng loas n a computer cluster poses another challenge. Ths thess presents a text classfcaton metho whch combnes the LDA moel an Support Vector Machne (SVM) classfcaton algorthm for an mprove accuracy n classfcaton when reucng the menson of atasets. Base on Densty-Base Spatal Clusterng of Applcatons wth Nose (DBSCAN), the algorthm automatcally optmzes the number of topcs to be selecte whch reuces the number of teratons n computaton. Furthermore, ths thess presents a nose ata reucton scheme to process nose ata. When the nose rato s large n the tranng ata set, the nose reucton scheme can always prouce a hgh level of accuracy n classfcaton. Fnally, the thess parallelzes LDA usng the MapReuce moel whch s the e facto computng stanar n supportng ata ntensve applcatons. A genetc algorthm base loa balancng algorthm s esgne to balance the workloas among computers n a heterogeneous MapReuce cluster where the computers have a varety of computng resources n terms of CPU spee, memory space an har sk space.
3 Acknowlegement I woul thank many people for ther help. Frst of all, I woul lke to thank my PhD supervsor, Prof. Maozhen L. In the process of the whole research, he always gave me the most help an guance from en to en. In aton, I have been encourage greatly wth hs avce an support so that I coul face all the ffcultes. Not only I learn more about my research, but also I learne how to analyze an solve the problem. I also thank Xaoyu Chen, Yang Lu, Yu Zhao. They gave me lots of sgnfcant opnons. Especally ther support an care n the usual lfe. Moreover, I stll woul thank the School of Engneerng an Desgn, an Brunel Unversty. Durng my PhD research years, I acheve all the aspects of the help from the School an the Unversty. Fnally, I thank my parents, grlfren an housemates. They gave me the greatest support an courage when I was n the most ffcult tme. Here, I woul lke to express my heartfelt thanks to all the frens who have helpe me. I shall remember the ays I spent at Brunel forever.
4 Author s Declaraton The work escrbe n ths thess has not been prevously submtte for a egree n ths or any other unversty an unless otherwse reference t s the author s own work.
5 Statement of Copyrght The copyrght of ths thess rests wth the author. No quotaton from t shoul be publshe wthout hs pror wrtten consent an nformaton erve from t shoul be acknowlege. v
6 Lst of Abbrevatons AD-LDA API BP CTM DAG DBSCAN DLB DP EM FIFO GA GFS GS HDFS HD-LDA HDP ICE Intel TBB JDK KLD Approxmate Dstrbute LDA Applcaton Program Interface Belef Propagaton Correlate Topc Moel Drecte Acyclc Graph Densty-Base Spatal Clusterng of Applcatons wth Nose Dynamc Loa Balancng Drchlet Process Expectaton-Maxmzaton Frst n Frst out Genetc Algorthm Google Fle System Gbbs Samplng Haoop Dstrbute Fle System Herarchcal Dstrbute LDA Herarchcal Drchlet Process Internet Communcatons Engne Intel Threang Bulng Blocks Java Development Kt Kullback-Lebler Dvergence v
7 K-NN LDA LSI MCMC MPI PAM PLDA PLSI SGA SSH SVD SVM TF-IDF VB VI VSM K-Nearest Neghbor Latent Drchlet Allocaton Latent Semantc Inexng Markov Chan Monte Carlo Message Passng Interface Pachnko Allocaton Moel Parallel Latent Drchlet Allocaton Probablstc Latent Semantc Inexng Smple Genetc Algorthm Secure Shell Sngular Value Decomposton Support Vector Machnes Term Frequency-Invert Document Frequency Varatonal Bayes Varatonal Inference Vector Space Moule v
8 Table of Contents Abstract... Acknowlegement... Author s Declaraton... Statement of Copyrght... v Lst of Abbrevatons... v Table of Contents... v Lst of Fgures... x Lst of Tables... xv Chapter 1 Introucton Backgroun Text Mnng Technques Hgh Performance Computng for Text Mnng Motvaton of Work Major Contrbutons Structure of the Thess Chapter 2 Lterature Revew Introucton Probablty Topc Moels TF-IDF Moel Mxture of Ungrams LSI Moel Basc Concepts The Moelng Process The Avantages of LSI The Dsavantages of LSI PLSI Moel Basc Concepts v
9 The Moelng Process The Avantages of PLSI The Dsavantages of PLSI LDA Moel Basc Concepts The Moelng Process The Avantages of LDA An Overvew of the Man Inference Algorthms of LDA Moel Parameters Varatonal Inference (VI) Belef Propagaton (BP) Gbbs Samplng (GS) Analyss an Dscusson An Overvew of Genetc Algorthm The Basc Iea of Genetc Algorthm The Man Steps of Genetc Algorthm Cong Mechansm Ftness Functon Selecton Crossover Mutaton The Parameter Settngs of Genetc Algorthms An Overvew of Haoop HDFS MapReuce Programmng Moel n Haoop Haoop Scheulng Algorthms Dvsble Loa Theory Summary Chapter 3 Text Classfcaton wth Latent Drchlet Allocaton Introucton Overvew of Text Classfcaton v
10 3.2.1 The Content of Text Classfcaton Text Preprocessng Text Representaton Text Feature Extracton an Dmenson Reucton Text Classfcaton Algorthms Classfcaton Performance Evaluaton System Text Classfcaton base on LDA Gbbs Samplng Approxmate Inference Parameters of LDA Moel The Specfc Steps of Text Classfcaton Experment an Analyss Expermental Envronment Tranng Envronment for SVM The Data Set Evaluaton Methos Expermental Results Summary Chapter 4 Accuracy Enhancement wth Optmze Topcs an Nose Processng Introucton The Metho of Selectng the Optmal Number of Topcs Current Man Selecton Methos of the Optmal Number of Topcs Base on LDA Moel The Metho of Selectng the Optmal Number of Topcs Base on HDP The Stanar Metho of Bayesan Statstcs A Densty-base Clusterng Metho of Selectng the Optmal Number of Topcs n LDA DBSCAN Algorthm The Relatonshp between the Optmal Moel an the Topc Smlarty A Metho of Selectng the Optmal Number of Topcs Base on x
11 DBSCAN Experment an Result Analyss Nosy Data Reucton The Nose Problem n Text Classfcaton The Current Man LDA-base Methos of Nose Processng The Data Smoothng Base on the Generaton Process of Topc Moel The Category Entropy Base on LDA Moel A Nose Data Reucton Scheme The Experment an Result Analyss Summary Chapter 5 Genetc Algorthm base Statc Loa Balancng for Parallel Latent Drchlet Allocaton Introucton The Current Man Parallelzaton Methos of LDA Mahout s Parallelzaton of LDA Yahoo s Parallel Topc Moel The Algorthm of Parallel LDA Base on Belef Propagaton Google s PLDA Analyss an Dscusson Parallelng LDA wth MapReuce/Haoop The Workng Process of MapReuce on Haoop The Algorthm an Implementaton of PLDA A Statc Loa Balancng Strategy Base on Genetc Algorthm for PLDA n Haoop The Algorthm Desgn The Desgn an Implementaton of Genetc Algorthm Encong Scheme The Intalzaton of Populaton Ftness Functon x
12 Crossover Mutaton The optmal retenton strategy Experment an Analyss Evaluatng PLDA n Haoop The Expermental Envronment Expermental Data The Experment n the Homogeneous Envronment The Experment n the Heterogeneous Envronment The Experment of Loa Balancng n a Smulate Envronment Summary Chapter 6 Concluson an Future Works Concluson Future Works The Supervse LDA The Improve PLDA PLDA Dynamc Loa Balancng Problem The Applcaton of Clou Computng Platform The Applcaton of Interscplnary Fel References x
13 Lst of Fgures Fgure 2.1: The generatve process of the topc moel Fgure 2.2: (Left) The ungrams (Rght) The mxture of ungrams Fgure 2.3: The agram of sngular value ecomposton (SVD) Fgure 2.4: The graphcal moel representaton of PLSI Fgure 2.5: The network topology of LDA latent topcs Fgure 2.6: (Left) The structure agram of LDA latent topcs (Rght) The graphcal moel representaton of LDA Fgure 2.7: The LDA probablstc graphcal moel wth the varatonal parameters. 39 Fgure 2.8: Belef propagaton n the LDA moel base on the factor graph Fgure 2.9: The typcal MapReuce framework n Haoop Fgure 3.1: The typcal process of automatc text classfcaton Fgure 3.2: The separatng hyperplane of SVM algorthm Fgure 3.3: Comparson of the performance of three methos on each class Fgure 4.1: HDP moel Fgure 4.2: The relatonshp between logp(w T) an T Fgure 4.3: The ata smoothng base on LDA weakens the nfluence of nose samples Fgure 4.4: The flow agram of a nose ata reucton scheme Fgure 4.5: The relatonshp between the number of teratons an the effect of nose processng wth fferent nose ratos Fgure 4.6: Classfcaton results of fferent methos wth varous nose ratos n frst group of ata Fgure 4.7: Classfcaton results of fferent methos wth varous nose ratos n secon group of ata Fgure 4.8: Classfcaton results of fferent methos wth varous nose ratos n thr group of ata x
14 Fgure 5.1: The framework of one Gbbs samplng teraton n MapReuce-LDA Fgure 5.2: The computng tme wth fferent number of ata noes n a homogeneous cluster Fgure 5.3: A comparson of computng tme of ealng wth fferent szes of ata wth eght ata noes Fgure 5.4: The performance comparson of the loa balancng strategy wth the fferent heterogenety n a smulate envronment Fgure 5.5: The performance comparson of PLDA wth fferent szes of ata n a smulate envronment Fgure 5.6: The convergence of genetc algorthm n the loa balancng strategy x
15 Lst of Tables Table 3.1: The strbuton of fve classes of text n the 20newsgroup corpus Table 3.2: Comparson of three methos macro-average an mcro-average Table 3.3: Comparson of three methos mensonalty reucton egree to corpus Table 4.1: Results of the propose algorthm to fn the optmal value of topc Table 4.2: Expermental ata sets Table 5.1: The features comparson of mplementng PLDA wth MPI an MapReuce Table 5.2: The confguraton of the expermental envronment Table 5.3: The confguraton of noes n a Haoop cluster Table 5.4: The expermental result of sngle machne an one ata noe n the cluster processng the ata Table 5.5: The confguraton of the smulate Haoop envronment xv
16 Chapter 1 Chapter 1 Introucton 1.1 Backgroun So far, the Internet has accumulate a huge number of gtal nformaton nclung news, blogs, web pages, e-books, mages, auo, veo, socal networkng an other forms of ata, an the number of them has been growng at the spee of the exploson contnually [1]. Thus, how people can organze an manage large-scale ata effectvely an obtan the requre useful nformaton quckly has become a huge challenge. For example, the ata s too large to use the tratonal ata analyss tools an technques to eal wth them. Sometmes, even f the ata set s relatvely small, because of untratonal characterstcs of the ata, the tratonal methos also cannot be use [2] [3]. In aton, the expert system technology can put knowlege nto the knowlege base manually by specal users or oman experts. Unfortunately, ths process often has some evaton an mstake, an t s tme-consumng an hgh-cost [4] [5]. Therefore, t s necessary to evelop new technologes an automatc tools whch can convert massve ata nto useful nformaton an knowlege ntellgently. Data mnng s a technque, an t s able to combne tratonal ata analyss methos wth complex algorthms that can eal wth large amounts of ata. Data mnng s a complex process where the unknown an valuable moes or rules are extracte from mass ata. Furthermore, t s an nterscplne, whch s closely relate to atabase system, statstcs, machne learnng, nformaton scence an other scplnes [1] [2] [4]. So, ata mnng can be seen as the result of the natural evoluton of nformaton technology. Accorng to the processng object, t can be ve nto object ata mnng, spatal ata mnng, multmea ata mnng, Web mnng an text mnng [3] [6]. Hgh Performance Latent Drchlet Allocaton for Text Mnng 1
17 Chapter 1 Text s the most mportant representaton of the nformaton. The statstcs research showe that 80 percent of nformaton n an organzaton was store n the form of text, whch nclue books, research papers, news artcles, Web pages, e-mal an so on [7]. Text s able to express vast an abunant nformaton meanwhle t contans lots of unetecte potental knowlege. The whole text set s not structure ata an t lacks machne-unerstanable semantcs so that t s qute ffcult to eal wth a huge number of ocuments. So n the fel of ata mnng, a new technology whch can process the above text ata effectvely was propose, whch was calle text mnng [5] [6] [8] Text Mnng Technques Text mnng was frst propose by Ronen Felman et al n 1995, whch was escrbe lke The Process of extractng nterestng Patterns from very large text collectons for the purpose of scoverng knowlege [6]. Text mnng was also known as text ata mnng or knowlege scovery n texts, whch was a process where the unknown, potental, unerstanable an useful knowlege can be foun from mass text ata [2] [6] [9] [10]. Meanwhle, t was also a process of analyzng text ata, extractng text nformaton an fnng out text knowlege [9] [11] [12]. The man technques of text mnng contan text classfcaton, text clusterng, text summarzaton, correlaton analyss, nformaton extracton, strbuton analyss, tren precton an so on [13] [14] [15]. Here, text classfcaton an text clusterng are the mnng methos whose object s the text set. But, the processng object of text summarzaton an nformaton extracton s a sngle ocument. There are many classfcaton methos n text classfcaton, an the frequently-use methos nclue Natve Bayes (NB), K-Nearest Neghbor (K-NN), Support Vector Machnes (SVM), Vector Space Moule (VSM) an Lnear Least Square Ft (LLSF) [6][7][8][9][12][13][14][15]. Hgh Performance Latent Drchlet Allocaton for Text Mnng 2
18 Chapter 1 In the fel of text mnng, machne learnng experts researche an put forwar probablstc topc moel, an fast unsupervse machne learnng algorthms were use to fn out the text hen nformaton automatcally [16] [17] [18] [19]. At present, man mnng latent semantc knowlege moels are Latent Semantc Inexng (LSI) [20] [21] [22] [23] [24], Probablstc Latent Semantc Inexng (PLSI) [25] [26] [27] [28] [29] an Latent Drchlet Allocaton (LDA) [30] [31] [32]. Ther applcaton almost covers all areas of text mnng an nformaton processng, such as text summarzaton, nformaton retreval an text classfcaton, etc [33] [34] [35]. Especally, Ble et al put forwar the LDA moel n 2003, whch was wely use to solve text classfcaton, text annotatons an other ssues. Beses, t create a seres of text processng methos whch were base on probablstc topc moelng [17] [36], an t was expane to mages, auo, veo an other multmea ata processng fels [37] [38]. LDA s a probablstc topc moelng whch moels screte ata sets such as text set [30]. It treats ocuments as the probablty strbuton of topcs an smplfes the generatve process of the text, whch helps to hanle large-scale text sets effcently. In aton, LDA s a three-ter Bayesan moel, whch nclues wors, topcs an ocuments three-ter structure. It makes each ocument express as a topc mxture where each topc s a probablty strbuton of the fxe wor lst. In bref, the basc moelng process of LDA s to establsh a ocument-wor co-occurrence matrx frstly. An then, a text tranng set s moele. Next, nference methos are use to obtan moel parameters, such as ocument-topc matrx an topc-wor matrx. Fnally, the learne moel wll be use to prect the topc probablty strbuton of new ocuments so as to express text nformaton [17] [30] [32]. Compare wth other topc moels, LDA has some unque avantages: Frstly, LDA Hgh Performance Latent Drchlet Allocaton for Text Mnng 3
19 Chapter 1 topc moel s a completely probablstc generatve moel, whch can use mature probablty algorthms to tran moels rectly. An, t s easy to use the moel [39]. Seconly, the sze of the parameter space of LDA s fxe an has nothng to o wth the sze of a text set so that t s more sutable for large-scale text sets [40]. Thrly, LDA s a herarchcal moel, whch s more stable than non-herarchcal moels [39] Hgh Performance Computng for Text Mnng In the aspects of processng spee, storage space, fault tolerance an access spee, the tratonal techncal archtecture an sngle computer wth seral-base approach are more an more unsutable to hanle mass ata [41] [42] [43] [44]. Parallel computng s an effectve metho of mprovng the computaton spee an processng capacty of the computer system. Its basc ea s to ecompose a problem nto several parts, an each part s compute by an nepenent processor n parallel [45] [46] [47]. A Parallel computng system can be a supercomputer wth multple processors, an t also can be a cluster whch s mae up of several nepenent computers that are nterconnecte n some way [44] [48]. However, for parallel programmng moels n the tratonal hgh-performance computng, such as PThrea, MPI an OpenMP [49] [50], evelopers have to unerstan bottome confguratons an parallel mplementaton etals. In 2004, Jeffrey Dean an Sanjay Ghemawat who were two engneers of Google propose the programmng ea of MapReuce [51], an apple t to the parallel strbute computng of large-scale ata sets. Accorng to functonal programmng eas, the MapReuce framework can ve a computatonal process nto two phases Map an Reuce [52] [53] [54]. In the Map phase, the nput ata s transmtte to map functons n the form of key-value pars. After operatons, an ntermeate set of key-value pars s generate. In the Reuce phase, all the ntermeate values sets wth the same keys wll be merge. The fnal result wll be output n the form of key-value Hgh Performance Latent Drchlet Allocaton for Text Mnng 4
20 Chapter 1 pars. For users, they only nee to mplement two functons map an reuce, reucng the complexty of esgnng parallel programmng sgnfcantly [51] [52]. Then, Google CEO Erc Schmt frst propose the concept of clou computng n 2006 [43]. Each bg nternatonal Internet nformaton technology company launche a seres of proucts n successon, showng ther research results [55] [56]. For example, Google launche Google App Engne whch was base on the clou computng envronment evelopment platform. Amazon launche Amazon EC2 (Amazon Elastc Compute Clou) whch was a powerful clou computng platform an Amazon S3 (Amazon Smple Storage Servce) whch can prove clou storage servce. Mcrosoft launche Wnows Azure whch was a clou computng platform. Clou computng s the result of comprehensve evelopment of parallel computng, strbute computng an gr computng. In other wors, clou computng s the commercal mplementaton of the above computng scence concepts [43] [57]. Clou computng s a new computng moe, s also a new composte moe of computer resources, an represents an nnovatve busness moe [55]. The basc prncple of clou computng s to strbute the requre calculaton, transmsson an storage n local computers or remote servers nto a large number of strbute computers, whch means that each computer shares huge tasks of calculaton, transmsson an storage together. Users can gan corresponng servces through the network, mprovng the resource utlzaton rate effectvely an realzng resource acquston on eman [58] [59]. The above clou computng platforms are commercal proucts, whch cannot be freely avalable to the major researchers. Thus, the emergence of open source clou computng platforms s able to gve the major researchers opportuntes. The researchers can use these open source projects to consttute a cluster system whch are mae up of several machnes n a laboratory envronment, an smulate the clou computng envronment n busness systems [57] [58]. Haoop s one of the most Hgh Performance Latent Drchlet Allocaton for Text Mnng 5
21 Chapter 1 famous open source strbute computng frameworks, whch acheves man technologes of Google clou computng [60] [61] [62]. Bascally, ts core content contans Haoop Dstrbute Fle System (HDFS) [53] [60] [61] [63] an MapReuce programmng moel [64] [65] [66]. Haoop orgnate n the projects of Lucene an Nutch, an then t evelope nto a separate project of Apache founaton. It s manly use for processng mass ata [53] [60] [63]. Beses, t proves a MapReuce framework base on Java wth strong portablty, whch can make the strbute computng apply to large low-cost clusters. Meanwhle, for nvuals an companes, t has a hgh performance so as to lay the founaton for the research an applcaton of ther strbute computng [67] [68]. Yahoo was the frst company whch use, researche an evelope Haoop eeply. In Yahoo, there were more than ten thousan CPUs n a sngle cluster usng Haoop, an there were hunres of researchers n the use of Haoop [52] [54]. Nowaays, t has become the manstream technology of clou computng. For example, unverstes, research nsttutons an the Internet companes research an use t [67]. Along wth the ncreasng popularty of the clou computng concept, Haoop wll get more attenton an faster evelopment [65]. Accorng to the gtze nformaton report whch was publshe by Internatonal Data Corporaton (IDC) n 2011, the amount of global nformaton woul be ouble an reouble every two years [1]. So, there s no oubt that t wll gve the ata storage an computng power of servers much pressure, whch cannot use tratonal topc moel algorthms to process large-scale text sets n hgh-performance servers [44] [69]. In aton, people cannot only hope that the mprovement of computer harware technology s able to enhance the processng effcency, but they nee to use effcent parallel computng technologes to apply to mult-core computers an cluster systems wth hgh-performance. Through a combnaton of harware an software, the accelerate processng can be accomplshe n the process of complex calculatons, solvng the bottleneck of computaton tme an memory capacty. At the Hgh Performance Latent Drchlet Allocaton for Text Mnng 6
22 Chapter 1 same tme, parallel computng can save cost to a certan extent, an complete large-scale computng tasks wth a lower nvestment [51] [68] [70]. When ealng wth mass ata, the savantage of LDA s that the computaton complexty wll be hgher an processng tme wll be longer [17] [71] [72]. Thus, how LDA can moel large-scale text effectvely, meet the requrements of the computaton tme an memory capacty, an fnally fn out latent topc rules s another bg ssue. In orer to mprove the LDA moel n processng mass ata, the parallel LDA algorthm has become one of the research hotspots. To research MapReuce parallel algorthm an apply t to topc moels has much practcal sgnfcance [51] [52] [66]. Frstly, parallel algorthms can solve the problem of calculaton an storage, an they can process large-scale ata more effcently. Seconly, parallel topc moels are able to aress many practcal applcaton problems, such as text classfcaton, nformaton retreval, text summarzaton an so on [17] [73]. Fnally, parallel topc moels can analyze the massve Internet user behavor ata to obtan useful nformaton, whch s able to be use n socal networkng webstes, ntellgent search an other Internet proucts [18] [72]. 1.2 Motvaton of Work Automatc text classfcaton nvolves ata mnng, computatonal lngustcs, nformatcs, artfcal ntellgence an other scplnes, whch s the bass an core of text mnng. It s an mportant applcaton fel of natural language processng, an also a sgnfcant applcaton technology of processng large-scale nformaton [2] [4] [74]. In short, text classfcaton technology s to make a large number of ocuments ve nto one or a group of categores where each category represents fferent concept topcs. Automatc text classfcaton systems can help users organze an obtan nformaton well, playng a sgnfcant role n mprovng the spee an Hgh Performance Latent Drchlet Allocaton for Text Mnng 7
23 Chapter 1 precson of nformaton retreval [5] [8]. Therefore, t has a very mportant research value. In text classfcaton, the selecton an mplementaton of classfcaton methos are the core parts of classfcaton systems. How to choose an approprate classfcaton moel s an mportant ssue [75] [76]. In aton, text feature selecton s also another key technology n text classfcaton [77] [78] [79]. The man role of makng probablstc topc moel apply to text classfcaton s to acheve the mensonalty reucton of the text ata representaton space by fnng out latent topc nformaton from ocuments. In the early stage, the most typcal representatve was the LSI moel. The mensonalty reucton effect of ts feature space s sgnfcant, but ts classfcaton performance often ecreases [80] [81]. Furthermore, ts parameter space grows lnearly wth the tranng ata because of the hgh computaton complexty of SVD operaton n LSI. These are lmtatons of LSI when t s apple to practcal problems. PLSI can be seen as a probablstc mprove verson of LSI, whch can escrbe the probablstc structure of the latent semantc space [25] [26]. However, PLSI stll has the problem of the parameter space growng lnearly wth the tranng set [29] [40]. Compare wth LSI an PLSI, the LDA moel s a completely probablstc generatve moel so that t has a complete nternal structure an t s able to use useful probablstc algorthms to tran an make use of the moel. In aton, the sze of the parameter space of the LDA moel has nothng to o wth the number of ocuments. Thus, LDA s more sutable to construct text representaton moel n a large-scale corpus [30] [32]. An, the LDA moel has got successful applcatons n machne learnng, nformaton retreval an other many fels. In the research of text classfcaton, the LDA moel s effectve but ts performance s not remarkable. The reason s that LDA s a non-supervse learnng moel so that t cannot be rectly use n classfcaton. It nees to be embee nto approprate classfcaton algorthms [82]. Hgh Performance Latent Drchlet Allocaton for Text Mnng 8
24 Chapter 1 The LDA moel tself s a generatve moel, an t s usually ntegrate wth generatve classfcaton algorthms. In ths generatve ntegrate moe, the tranng of classfers oes not use the entre tranng corpus to gan a sngle LDA moel, but the sub-lda moels of corresponng categores are obtane by each category of the tranng corpus. In fact, each sub-lda moel escrbes latent topcs of corresponng categores of ocuments [32] [83]. Because of separate tranng, each category shares a group of topcs, an topcs between categores are solate. But, one of man problems of ths classfcaton metho s that the target ocument wll take place the force strbuton of latent topcs n categores whch o not contan the target ocument, resultng n that the calculaton of the generatve probablty prouces the evaton n these categores so as to reuce the classfcaton performance [84]. Therefore, how to choose a sutable classfcaton algorthm combne wth the LDA moel to construct an effectve classfer that has become a challenge. In the LDA moel, topcs obey Drchet strbuton whch assumes that the emergence of a topc has nothng to o wth the emergence of other topcs. But n the real ata, many topcs have assocatons between them. For example, when PC appears, the probablty of Computer appearng wll be qute hgh but t s unlkely for Hosptal to appear. Obvously, ths nepenent assumpton s nconsstent wth the real ata. So, when usng the LDA moel to moel the whole text set, the number of topcs n the LDA moel wll nfluence the performance of moelng greatly [82]. Therefore, how to etermne the optmal number of topcs n the LDA moel s another research hotspot of LDA. At present, to etermne the optmal number of topcs n the LDA moel has two man methos: the selecton metho base on Herarchcal Drchlet Process (HDP) [86] [87] an the stanar metho n Bayesan statstcs [82]. The former uses the nonparametrc feature of Drchlet Process (DP) to solve the selecton problem of the optmal number of topcs n LDA. But, t nees to establsh a HDP moel an a LDA moel respectvely for the same one ata set. Obvously, when processng practcal Hgh Performance Latent Drchlet Allocaton for Text Mnng 9
25 Chapter 1 text classfcaton problems, t wll spen a lot of tme n computng [86] [88]. The latter nees to specfy a seres of values of T manually, where T stans for the optmal number of topcs n LDA. After carryng out Gbbs samplng algorthm an the relate calculaton, the value of T n the LDA moel can be etermne fnally [82]. Thus, an algorthm nees to be esgne to fn out the optmal number of topcs automatcally wth low consumng tme an operaton. In aton, n text classfcaton, the qualty of classfers wll affect the fnal classfcaton result greatly. An, the qualty of classfers epens on the qualty of the tranng text set to a great extent. In general, f classes of the tranng text set are more accurate an ts content s more comprehensve, the qualty of the obtane classfer wll be hgher [89] [90]. However, n practcal applcatons, t s qute ffcult to obtan ths hgh qualty of tranng text sets, especally large-scale text sets. Usually, the tranng ata contans nose unavoably, an the nose samples wll have a sgnfcant mpact on the fnal classfcaton result [91]. Usually, a manstream approach of nose processng s to entfy an remove nose samples from ata sets [90] [91]. At present, there are two man nose processng methos base on LDA, the ata smoothng metho base on the LDA moel [92] an the nose entfcaton metho base on the category entropy [93] [94]. They can remove a majorty of nose samples from text sets effectvely to some extent, but they cannot elete all the nose completely [95]. Furthermore, some normal samples wll be wrongly remove as nose unavoably [96]. Therefore, these ssues also become a challenge n text classfcaton. In orer to meet the requrements of storage capacty an computaton spee, the sngle-core seral programmng s turnng to the mult-core parallel programmng technology graually [41] [42]. Parallel programmng applyng to clusters or servers can solve some lmtatons when topc moel learnng algorthms process mass ata [69]. So, parallel topc moels are able to eal wth large-scale ata effectvely. But, Hgh Performance Latent Drchlet Allocaton for Text Mnng 10
Cluster Analysis. Cluster Analysis
Cluster Analyss Cluster Analyss What s Cluster Analyss? Types of Data n Cluster Analyss A Categorzaton of Maor Clusterng Methos Parttonng Methos Herarchcal Methos Densty-Base Methos Gr-Base Methos Moel-Base
More informationDEGREES OF EQUIVALENCE IN A KEY COMPARISON 1 Thang H. L., Nguyen D. D. Vietnam Metrology Institute, Address: 8 Hoang Quoc Viet, Hanoi, Vietnam
DEGREES OF EQUIVALECE I A EY COMPARISO Thang H. L., guyen D. D. Vetnam Metrology Insttute, Aress: 8 Hoang Quoc Vet, Hano, Vetnam Abstract: In an nterlaboratory key comparson, a ata analyss proceure for
More informationThe Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna wangtngzhong2@sna.cn Abstract.
More informationImproved SVM in Cloud Computing Information Mining
Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu
More informationForecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye cjngwe@stanford.edu mchen5@stanford.edu nanye@stanford.edu Abstract - Stock market s one of the most complcated systems
More informationThe Design of Efficiently-Encodable Rate-Compatible LDPC Codes
The Desgn of Effcently-Encoable Rate-Compatble LDPC Coes Jaehong Km, Atya Ramamoorthy, Member, IEEE, an Steven W. McLaughln, Fellow, IEEE Abstract We present a new class of rregular low-ensty party-check
More informationOn the Optimal Marginal Rate of Income Tax
On the Optmal Margnal Rate of Income Tax Gareth D Myles Insttute for Fscal Stues an Unversty of Exeter June 999 Abstract: The paper shows that n the quas-lnear moel of ncome taxaton, the optmal margnal
More informationCalculating the high frequency transmission line parameters of power cables
< ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,
More informationExact GP Schema Theory for Headless Chicken Crossover and Subtree Mutation
Exact GP Schema Theory for Healess Chcken Crossover an Subtree Mutaton Rccaro Pol School of Computer Scence The Unversty of Brmngham Brmngham, B5 TT, UK R.Pol@cs.bham.ac.uk Ncholas F. McPhee Dvson of Scence
More informationEXAMPLE PROBLEMS SOLVED USING THE SHARP EL-733A CALCULATOR
EXAMPLE PROBLEMS SOLVED USING THE SHARP EL-733A CALCULATOR 8S CHAPTER 8 EXAMPLES EXAMPLE 8.4A THE INVESTMENT NEEDED TO REACH A PARTICULAR FUTURE VALUE What amount must you nvest now at 4% compoune monthly
More informationA Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression
Novel Methodology of Workng Captal Management for Large Publc Constructons by Usng Fuzzy S-curve Regresson Cheng-Wu Chen, Morrs H. L. Wang and Tng-Ya Hseh Department of Cvl Engneerng, Natonal Central Unversty,
More informationA Binary Quantum-behaved Particle Swarm Optimization Algorithm with Cooperative Approach
IJCSI Internatonal Journal of Computer Scence Issues, Vol., Issue, No, January 3 ISSN (Prnt): 694-784 ISSN (Onlne): 694-84 www.ijcsi.org A Bnary Quantum-behave Partcle Swarm Optmzaton Algorthm wth Cooperatve
More informationA Programming Model for the Cloud Platform
Internatonal Journal of Advanced Scence and Technology A Programmng Model for the Cloud Platform Xaodong Lu School of Computer Engneerng and Scence Shangha Unversty, Shangha 200072, Chna luxaodongxht@qq.com
More informationModule 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
More informationForecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
More informationWhat is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
More informationOn-Line Fault Detection in Wind Turbine Transmission System using Adaptive Filter and Robust Statistical Features
On-Lne Fault Detecton n Wnd Turbne Transmsson System usng Adaptve Flter and Robust Statstcal Features Ruoyu L Remote Dagnostcs Center SKF USA Inc. 3443 N. Sam Houston Pkwy., Houston TX 77086 Emal: ruoyu.l@skf.com
More informationA Practical Study of Regenerating Codes for Peer-to-Peer Backup Systems
A Practcal Stuy of Regeneratng Coes for Peer-to-Peer Backup Systems Alessanro Dumnuco an Ernst Bersack EURECOM Sopha Antpols, France {umnuco,bersack}@eurecom.fr Abstract In strbute storage systems, erasure
More informationAPPLICATION OF BINARY DIVISION ALGORITHM FOR IMAGE ANALYSIS AND CHANGE DETECTION TO IDENTIFY THE HOTSPOTS IN MODIS IMAGES
APPLICATION OF BINARY DIVISION ALGORITHM FOR IMAGE ANALYSIS AND CHANGE DETECTION TO IDENTIFY THE HOTSPOTS IN MODIS IMAGES Harsh Kumar G R * an Dharmenra Sngh (hargrec@tr.ernet.n, harmfec@tr.ernet.n) Department
More informationTitle Language Model for Information Retrieval
Ttle Language Model for Informaton Retreval Rong Jn Language Technologes Insttute School of Computer Scence Carnege Mellon Unversty Alex G. Hauptmann Computer Scence Department School of Computer Scence
More informationA Load-Balancing Algorithm for Cluster-based Multi-core Web Servers
Journal of Computatonal Informaton Systems 7: 13 (2011) 4740-4747 Avalable at http://www.jofcs.com A Load-Balancng Algorthm for Cluster-based Mult-core Web Servers Guohua YOU, Yng ZHAO College of Informaton
More informationOn the Optimal Control of a Cascade of Hydro-Electric Power Stations
On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;
More informationRSA Cryptography using Designed Processor and MicroBlaze Soft Processor in FPGAs
RSA Cryptography usng Desgne Processor an McroBlaze Soft Processor n FPGAs M. Nazrul Islam Monal Dept. of CSE, Rajshah Unversty of Engneerng an Technology, Rajshah-6204, Banglaesh M. Al Mamun Dept. of
More informationAn Efficient Recovery Algorithm for Coverage Hole in WSNs
An Effcent Recover Algorthm for Coverage Hole n WSNs Song Ja 1,*, Wang Balng 1, Peng Xuan 1 School of Informaton an Electrcal Engneerng Harbn Insttute of Technolog at Weha, Shanong, Chna Automatc Test
More informationBayesian Network Based Causal Relationship Identification and Funding Success Prediction in P2P Lending
Proceedngs of 2012 4th Internatonal Conference on Machne Learnng and Computng IPCSIT vol. 25 (2012) (2012) IACSIT Press, Sngapore Bayesan Network Based Causal Relatonshp Identfcaton and Fundng Success
More informationMultiple-Period Attribution: Residuals and Compounding
Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens
More informationAn Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
More informationA RELIABLE SEMI-DISTRIBUTED LOAD BALANCING ARCHITECTURE OF HETEROGENEOUS WIRELESS NETWORKS
Internatonal Journal of Computer Networks & Communcatons (IJCNC) Vol.4, No., January 0 A RELIABLE SEMI-DISTRIBUTED LOAD BALANCING ARCHITECTURE OF HETEROGENEOUS WIRELESS NETWORKS M. Golam Rabul Alam, Chayan
More informationA Secure Password-Authenticated Key Agreement Using Smart Cards
A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,
More information320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:
More informationFace Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)
Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton
More informationRank Based Clustering For Document Retrieval From Biomedical Databases
Jayanth Mancassamy et al /Internatonal Journal on Computer Scence and Engneerng Vol.1(2), 2009, 111-115 Rank Based Clusterng For Document Retreval From Bomedcal Databases Jayanth Mancassamy Department
More informationTHE LOAD PLANNING PROBLEM FOR LESS-THAN-TRUCKLOAD MOTOR CARRIERS AND A SOLUTION APPROACH. Professor Naoto Katayama* and Professor Shigeru Yurimoto*
7th Internatonal Symposum on Logstcs THE LOAD PLAIG PROBLEM FOR LESS-THA-TRUCKLOAD MOTOR CARRIERS AD A SOLUTIO APPROACH Professor aoto Katayama* an Professor Shgeru Yurmoto* * Faculty of Dstrbuton an Logstcs
More informationOpen Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1
Send Orders for Reprnts to reprnts@benthamscence.ae The Open Cybernetcs & Systemcs Journal, 2014, 8, 115-121 115 Open Access A Load Balancng Strategy wth Bandwdth Constrant n Cloud Computng Jng Deng 1,*,
More informationMining Feature Importance: Applying Evolutionary Algorithms within a Web-based Educational System
Mnng Feature Importance: Applyng Evolutonary Algorthms wthn a Web-based Educatonal System Behrouz MINAEI-BIDGOLI 1, and Gerd KORTEMEYER 2, and Wllam F. PUNCH 1 1 Genetc Algorthms Research and Applcatons
More informationA novel Method for Data Mining and Classification based on
A novel Method for Data Mnng and Classfcaton based on Ensemble Learnng 1 1, Frst Author Nejang Normal Unversty;Schuan Nejang 641112,Chna, E-mal: lhan-gege@126.com Abstract Data mnng has been attached great
More informationOn the computation of the capital multiplier in the Fortis Credit Economic Capital model
On the computaton of the captal multpler n the Forts Cret Economc Captal moel Jan Dhaene 1, Steven Vuffel 2, Marc Goovaerts 1, Ruben Oleslagers 3 Robert Koch 3 Abstract One of the key parameters n the
More informationDocument Clustering Analysis Based on Hybrid PSO+K-means Algorithm
Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,
More informationCan Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang
More informationDescriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications
CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary
More informationAn Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
More informationFeature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
More informationNEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION
NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State
More informationThe Investment Decision-Making Index System and the Grey Comprehensive Evaluation Method under Hybrid Cloud
The Investment Decson-Makn Inex System an the Grey Comprehensve Evaluaton Metho uner Hybr Clou Donln Chen 1, Mn Fu 1, Xueron Jan 2, an Dawe Son 1 1 School o Economcs, WHUT, Wuhan, Chna 2 School o Manaement,
More informationExploiting Recommendation on Social Media Networks
Internatonal Journal of Scence and Research IJSR) ISSN Onln: 2319-7064 Index Coperncus Value 2013): 6.14 Impact Factor 2013): 4.438 Explotng Recommendaton on Socal Meda Networs Swat A. Adhav 1, Sheetal
More informationIMPACT ANALYSIS OF A CELLULAR PHONE
4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng
More informationCredit Limit Optimization (CLO) for Credit Cards
Credt Lmt Optmzaton (CLO) for Credt Cards Vay S. Desa CSCC IX, Ednburgh September 8, 2005 Copyrght 2003, SAS Insttute Inc. All rghts reserved. SAS Propretary Agenda Background Tradtonal approaches to credt
More informationVision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION
Vson Mouse Saurabh Sarkar a* a Unversty of Cncnnat, Cncnnat, USA ABSTRACT The report dscusses a vson based approach towards trackng of eyes and fngers. The report descrbes the process of locatng the possble
More informationAn Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement
An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence
More informationDevelopment of an intelligent system for tool wear monitoring applying neural networks
of Achevements n Materals and Manufacturng Engneerng VOLUME 14 ISSUE 1-2 January-February 2006 Development of an ntellgent system for tool wear montorng applyng neural networks A. Antć a, J. Hodolč a,
More informationSurvey on Virtual Machine Placement Techniques in Cloud Computing Environment
Survey on Vrtual Machne Placement Technques n Cloud Computng Envronment Rajeev Kumar Gupta and R. K. Paterya Department of Computer Scence & Engneerng, MANIT, Bhopal, Inda ABSTRACT In tradtonal data center
More informationTrust Network and Trust Community Clustering based on Shortest Path Analysis for E-commerce
Internatonal Journal of u- an e- Serce, Scence an Technology Trust Network an Trust Communty Clusterng base on Shortest Path Analyss for E-commerce Shaozhong Zhang 1, Jungan Chen 1, Haong Zhong 2, Zhaox
More informationA heuristic task deployment approach for load balancing
Xu Gaochao, Dong Yunmeng, Fu Xaodog, Dng Yan, Lu Peng, Zhao Ja Abstract A heurstc task deployment approach for load balancng Gaochao Xu, Yunmeng Dong, Xaodong Fu, Yan Dng, Peng Lu, Ja Zhao * College of
More informationA hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel
More informationTraffic State Estimation in the Traffic Management Center of Berlin
Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal peter.vortsch@ptv.de Peter Möhl, PTV AG,
More informationRealistic Image Synthesis
Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random
More information) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance
Calbraton Method Instances of the Cell class (one nstance for each FMS cell) contan ADC raw data and methods assocated wth each partcular FMS cell. The calbraton method ncludes event selecton (Class Cell
More informationECE544NA Final Project: Robust Machine Learning Hardware via Classifier Ensemble
1 ECE544NA Fnal Project: Robust Machne Learnng Hardware va Classfer Ensemble Sa Zhang, szhang12@llnos.edu Dept. of Electr. & Comput. Eng., Unv. of Illnos at Urbana-Champagn, Urbana, IL, USA Abstract In
More informationLITERATURE REVIEW: VARIOUS PRIORITY BASED TASK SCHEDULING ALGORITHMS IN CLOUD COMPUTING
LITERATURE REVIEW: VARIOUS PRIORITY BASED TASK SCHEDULING ALGORITHMS IN CLOUD COMPUTING 1 MS. POOJA.P.VASANI, 2 MR. NISHANT.S. SANGHANI 1 M.Tech. [Software Systems] Student, Patel College of Scence and
More informationRisk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008
Rsk-based Fatgue Estmate of Deep Water Rsers -- Course Project for EM388F: Fracture Mechancs, Sprng 2008 Chen Sh Department of Cvl, Archtectural, and Envronmental Engneerng The Unversty of Texas at Austn
More informationImplementation of Deutsch's Algorithm Using Mathcad
Implementaton of Deutsch's Algorthm Usng Mathcad Frank Roux The followng s a Mathcad mplementaton of Davd Deutsch's quantum computer prototype as presented on pages - n "Machnes, Logc and Quantum Physcs"
More informationOnline Inference of Topics with Latent Dirichlet Allocation
Onlne Inference of Topcs wth Latent Drchlet Allocaton Kevn R. Cann Computer Scence Dvson Unversty of Calforna Berkeley, CA 94720 kevn@cs.berkeley.edu Le Sh Helen Wlls Neuroscence Insttute Unversty of Calforna
More informationRobust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School
Robust Desgn of Publc Storage Warehouses Yemng (Yale) Gong EMLYON Busness School Rene de Koster Rotterdam school of management, Erasmus Unversty Abstract We apply robust optmzaton and revenue management
More informationHow To Know The Components Of Mean Squared Error Of Herarchcal Estmator S
S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta
More informationA Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing
A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure
More informationData Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,
More informationA DATA MINING APPLICATION IN A STUDENT DATABASE
JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul
More informationProduct Quality and Safety Incident Information Tracking Based on Web
Product Qualty and Safety Incdent Informaton Trackng Based on Web News 1 Yuexang Yang, 2 Correspondng Author Yyang Wang, 2 Shan Yu, 2 Jng Q, 1 Hual Ca 1 Chna Natonal Insttute of Standardzaton, Beng 100088,
More informationPOLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and
POLYSA: A Polynomal Algorthm for Non-bnary Constrant Satsfacton Problems wth and Mguel A. Saldo, Federco Barber Dpto. Sstemas Informátcos y Computacón Unversdad Poltécnca de Valenca, Camno de Vera s/n
More informationLogistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification
Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson
More informationMethodology to Determine Relationships between Performance Factors in Hadoop Cloud Computing Applications
Methodology to Determne Relatonshps between Performance Factors n Hadoop Cloud Computng Applcatons Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng and
More informationAnts Can Schedule Software Projects
Ants Can Schedule Software Proects Broderck Crawford 1,2, Rcardo Soto 1,3, Frankln Johnson 4, and Erc Monfroy 5 1 Pontfca Unversdad Católca de Valparaíso, Chle FrstName.Name@ucv.cl 2 Unversdad Fns Terrae,
More informationAn Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services
An Evaluaton of the Extended Logstc, Smple Logstc, and Gompertz Models for Forecastng Short Lfecycle Products and Servces Charles V. Trappey a,1, Hsn-yng Wu b a Professor (Management Scence), Natonal Chao
More informationFault tolerance in cloud technologies presented as a service
Internatonal Scentfc Conference Computer Scence 2015 Pavel Dzhunev, PhD student Fault tolerance n cloud technologes presented as a servce INTRODUCTION Improvements n technques for vrtualzaton and performance
More informationProbabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*
Probablstc Latent Semantc User Segmentaton for Behavoral Targeted Advertsng* Xaohu Wu 1,2, Jun Yan 2, Nng Lu 2, Shucheng Yan 3, Yng Chen 1, Zheng Chen 2 1 Department of Computer Scence Bejng Insttute of
More informationIncentive Compatible Mechanisms for Group Ticket Allocation in Software Maintenance Services
14th Asa-Pacfc Software Engneerng Conference Incentve Compatble Mechansms for Group Tcket Allocaton n Software Mantenance Servces Karthk Subban, Ramakrshnan Kannan IBM R Ina Software Lab, EGL D Block,
More informationOverview of monitoring and evaluation
540 Toolkt to Combat Traffckng n Persons Tool 10.1 Overvew of montorng and evaluaton Overvew Ths tool brefly descrbes both montorng and evaluaton, and the dstncton between the two. What s montorng? Montorng
More informationLatent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006
Latent Class Regresson Statstcs for Psychosocal Research II: Structural Models December 4 and 6, 2006 Latent Class Regresson (LCR) What s t and when do we use t? Recall the standard latent class model
More informationA Simple Approach to Clustering in Excel
A Smple Approach to Clusterng n Excel Aravnd H Center for Computatonal Engneerng and Networng Amrta Vshwa Vdyapeetham, Combatore, Inda C Rajgopal Center for Computatonal Engneerng and Networng Amrta Vshwa
More informationA General and Practical Datacenter Selection Framework for Cloud Services
212 IEEE Ffth Internatonal Conference on Clou Computng A General an Practcal Datacenter Selecton Framework for Clou Servces Hong Xu, Baochun L henryxu, bl@eecg.toronto.eu Department of Electrcal an Computer
More informationEnterprise Master Patient Index
Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an
More informationCS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements
Lecture 3 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there
More informationActivity Scheduling for Cost-Time Investment Optimization in Project Management
PROJECT MANAGEMENT 4 th Internatonal Conference on Industral Engneerng and Industral Management XIV Congreso de Ingenería de Organzacón Donosta- San Sebastán, September 8 th -10 th 010 Actvty Schedulng
More informationSearching for Interacting Features for Spam Filtering
Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software
More informationIWFMS: An Internal Workflow Management System/Optimizer for Hadoop
IWFMS: An Internal Workflow Management System/Optmzer for Hadoop Lan Lu, Yao Shen Department of Computer Scence and Engneerng Shangha JaoTong Unversty Shangha, Chna lustrve@gmal.com, yshen@cs.sjtu.edu.cn
More informationBERNSTEIN POLYNOMIALS
On-Lne Geometrc Modelng Notes BERNSTEIN POLYNOMIALS Kenneth I. Joy Vsualzaton and Graphcs Research Group Department of Computer Scence Unversty of Calforna, Davs Overvew Polynomals are ncredbly useful
More informationStudy on Model of Risks Assessment of Standard Operation in Rural Power Network
Study on Model of Rsks Assessment of Standard Operaton n Rural Power Network Qngj L 1, Tao Yang 2 1 Qngj L, College of Informaton and Electrcal Engneerng, Shenyang Agrculture Unversty, Shenyang 110866,
More informationMETHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS
METHODOLOGY TO DETERMINE RELATIONSHIPS BETWEEN PERFORMANCE FACTORS IN HADOOP CLOUD COMPUTING APPLICATIONS Lus Eduardo Bautsta Vllalpando 1,2, Alan Aprl 1 and Alan Abran 1 1 Department of Software Engneerng
More informationTHE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION
Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh
More informationA Dynamic Load Balancing for Massive Multiplayer Online Game Server
A Dynamc Load Balancng for Massve Multplayer Onlne Game Server Jungyoul Lm, Jaeyong Chung, Jnryong Km and Kwanghyun Shm Dgtal Content Research Dvson Electroncs and Telecommuncatons Research Insttute Daejeon,
More informationParallel Numerical Simulation of Visual Neurons for Analysis of Optical Illusion
212 Thrd Internatonal Conference on Networkng and Computng Parallel Numercal Smulaton of Vsual Neurons for Analyss of Optcal Illuson Akra Egashra, Shunj Satoh, Hdetsugu Ire and Tsutomu Yoshnaga Graduate
More informationiavenue iavenue i i i iavenue iavenue iavenue
Saratoga Systems' enterprse-wde Avenue CRM system s a comprehensve web-enabled software soluton. Ths next generaton system enables you to effectvely manage and enhance your customer relatonshps n both
More informationLuby s Alg. for Maximal Independent Sets using Pairwise Independence
Lecture Notes for Randomzed Algorthms Luby s Alg. for Maxmal Independent Sets usng Parwse Independence Last Updated by Erc Vgoda on February, 006 8. Maxmal Independent Sets For a graph G = (V, E), an ndependent
More informationPresent Values and Accumulations
Present Values an Accumulatons ANGUS S. MACDONALD Volume 3, pp. 1331 1336 In Encyclopea Of Actuaral Scence (ISBN -47-84676-3) Ete by Jozef L. Teugels an Bjørn Sunt John Wley & Sons, Lt, Chchester, 24 Present
More informationGender Classification for Real-Time Audience Analysis System
Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa vhr@yandex.ru, shmaglt_lev@yahoo.com, andrey.shemakov@gmal.com,
More informationConversion between the vector and raster data structures using Fuzzy Geographical Entities
Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,
More informationOxygen Saturation Measurement and Optimal Accuracy in Nair
The Applcaton of Threshold De-nosng n Moble Oxygen Saturaton Montorng Software Tang Nng and Xu Zhenzhen School of Computer Scence and Technology, Donghua Unversty, Shangha, Chna 201620 tnwysyd@126.com
More informationData Mining from the Information Systems: Performance Indicators at Masaryk University in Brno
Data Mnng from the Informaton Systems: Performance Indcators at Masaryk Unversty n Brno Mkuláš Bek EUA Workshop Strasbourg, 1-2 December 2006 1 Locaton of Brno Brno EUA Workshop Strasbourg, 1-2 December
More informationSCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS
SCHEDULING OF CONSTRUCTION PROJECTS BY MEANS OF EVOLUTIONARY ALGORITHMS Magdalena Rogalska 1, Wocech Bożeko 2,Zdzsław Heduck 3, 1 Lubln Unversty of Technology, 2- Lubln, Nadbystrzycka 4., Poland. E-mal:rogalska@akropols.pol.lubln.pl
More informationCharacterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University
Characterzaton of Assembly Varaton Analyss Methods A Thess Presented to the Department of Mechancal Engneerng Brgham Young Unversty In Partal Fulfllment of the Requrements for the Degree Master of Scence
More information