Full-Text Clustering Methods for Current Research Directions Detection *
|
|
|
- Scarlett York
- 9 years ago
- Views:
Transcription
1 Full-Text Clusterng Methods for Current Research Drectons Detecton * Dmtry Devyatn Ilya Thomrov Alexander Shvets Oleg Grgorev Insttute for systems analyss of RAS, Moscow [email protected], [email protected], [email protected] [email protected] Konstantn Popov Engelhardt Insttute of Molecular Bology of RAS, Moscow [email protected] Abstract The paper contans a bref overvew of full-text clusterng methods for current research drectons detecton. A novel full-text clusterng method s proposed. A dataset s created and expermental results are verfed by problem doman Regeneratve medcne experts wth PhD degrees. The proposed method s well applcable for research drectons detecton accordng to expermental results. Fnally, prospects and drawbacs of the proposed method are dscussed. * The research s supported by Russan Foundaton for Basc Research, project of_m 1 Introducton Commonly methods for research drectons detecton are based on dfferent clusterng methods [4]. The problem s that all these methods requre tunng to a research area and dataset. Common clusterng evaluaton metrcs are not applcable for drectons detecton. They are often based on degree of nsularty of clusters [1, 12]. But crtera for cluster buldng are wealy formalzed n drectons detecton tas. Several approaches use emprcal estmates, determned by experts, for solvng ths problem [6]. However, obtanng of these estmates s complcated and nondetermnstc process, whch requres nteracton between data-mnng experts and experts of analyzed research area. Moreover, crtera for ncluson of some scentfc paper to a research drecton depends on research area of ths paper, thus an opnon of experts n ths research area should be taen nto consderaton. Therefore, accordng to specalty of the problem a semautomatcally approach, that uses a small labeled part of analyzed dataset for tranng clusterng methods, s proposed n ths paper. Let D { d1, d2,.., dn} s a corpus of scentfc papers; where n s a number of Proceedngs of the XVII Internatonal Conference «Data Analytcs and Management n Data Intensve Domans» (DAMDID/RCDL 2015), Obnns, Russa, October 13-16, 2015 documents n ths collecton and C { c1, c2,.., cn} s a set of prevously nown research drectons n D. Suppose D c D s a small ncomplete set of papers that are n C. The goal s to get full set of research drectons C f and allocaton of papers to ths set. The approach conssts n applyng a qualty functon that can estmate dfference between clusterng method results and gven dstrbuton C. Then methods of constraned optmzaton are used for tunng parameters of the clusterng method. Unle well-nown classfer tranng tas, the complete drectons set C f s a pror unnown. So t s necessary to fnd a method, whch can be tuned n applance wth ths approach properly. The prmary goal of ths paper s to present an mproved method for research drectons detecton, whch can process datasets sem-automatcally. Besdes, we present results of comparson between proposed method and the state-of-the-art clusterng methods such as Brch [19], affnty propagaton [5] and ther combnatons wth topcal latent Drchlet allocaton model [3]. 2 Related wor Consder some methods for scence drectons detecton. There are three groups of methods, used for research drecton detecton. All of them use clusterng methods, but dffer n the appled measure of smlarty detecton: the co-ctaton measure [4], the text measure or the hybrd measure. The last one usually s based on an assessment of co-ctatons of papers, on the ntellgent analyss of texts and on allocaton of sgnfcant papers [6]. In [9] defnton of the prospectve research drectons s carred out usng co-word analyss,.e. papers smlarty. Ths approach s close to generatve topc dstrbuton clusterng methods. Applcaton of ths method for detecton of regulartes and trends n the area of nformaton securty s shown. The SCI base [13] s used as data source; eywords are collected from the lst of the terms allocated by authors of papers. These eywords were normalzed. To chec dynamc 152
2 changes n the area of nformaton securty, authors offer the whole set of methods for marng of papers. They conclude that n the area of nformaton securty there s constant drectons set, and at the same tme new drectons emerge regularly. The man dsadvantage of ths method s that t uses only structured authors eywords. Ths approach leads to decreasng of methods objectvty; because authors eywords often dstngush from the real terms of a paper. In ths wor we wll use full-text clusterng approaches because of ther unversalty: they do not need ctaton databases for proper results. Let us dscuss several clusterng methods. For example, affnty propagaton s one of common clusterng methods. In ths method a set of papers s consdered as a networ of connected nodes, and the smlarty of papers corresponds to the weght of edges n ths networ. The method s based on mechansm of passng messages between nodes of the networ [5]. Affnty propagaton detects exemplars - papers n the nput dataset that are representatve for clusters. The networ nodes send to each other messages untl a set of exemplars and clusters gradually emerges. Potental drawbac of ths method s ntally exemplars selecton. One cannot provde ntally exemplars dstrbuton for all drectons, so the exemplars can be selected naccurately. In addton, the approach can lead to naccuraces n cases when t s mpossble to dentfy an exemplar whch descrbes all the papers n the cluster properly [10]. Unle affnty propagaton, proposed method uses terms descrptors to descrbe clusters. These terms correspond to all papers of a cluster (due to the usng of the topc mportance measure for the terms weghtng) [14] that leads to better qualty of clusterng. Another common clusterng method s Brch [19]. Ths algorthm bulds a weghted balanced tree (CF-tree) n a sngle pass through the data. The tree stores nformaton about sub-clusters n leaves of the tree. Durng the clusterng each paper s added to the exstng leaf, or a new leaf s created. Brch also can be used n stream clusterng when the number of documents to be processed can be theoretcally nfnte. Specalzed methods for text clusterng are worth mentonng. In [14] well-scalable full-text clusterng approach for detectng research drectons was proposed. Hash functons are wdely used n ths method n order to get good performance. Ths method has nsuffcent coverage of a dataset: there are large amount of papers, whch do not belong to any of the clusters. In ths study we mprove classfcaton decson rule for better coverage. We also refne extractng of cores of clusters for better clusterng qualty. The methods of topc modellng based on the generatng models are often appled for full-text clusterng. For example, Latent Drchlet Allocaton (LDA) [3] s appled for clusterng of large amounts of papers. The method s an mprovement of Latent Semantc Indexng. It assumes that each object s presented n several classes whch dstrbuton s ncluded nto parametrcal famly of Drchlet dstrbuton. A drawbac of the method s nstablty to nput data, that maes the results unnterpretable [8, 18]. We use LDA as a prelmnary step for clusterng by well-nown methods. Ths apples to sgnfcantly reduce a feature space length and so to ncrease the common clusterng methods performance and qualty. 3 Method descrpton 3.1 Fast algorthm for smlar document search The cornerstone of proposed full-text clusterng method s smlar document search, prevously dscussed n [15]. Ths algorthm use nverted ndex structures for nput data. Most of the full-text search algorthms apply smlar structures for mprovng search performance [17]. Let w s a term (a word or a phrase) from a paper, t weght of the term. We use wellnown tf df measure for weghtng terms n papers. It s a product of term frequency functon tf and nverse document frequency functon df. tf ( w, d) Where n s number of tmes term w occurs n the paper d, n s a number of all terms occurrences n d. D df ( w, log d w Where D s the number of papers n corpus D, d w s the number of papers from D that have term w and the logarthm base does not matter. Defne drect ndex of papers as a functon that returns a set of dfferent terms and ther weghts from the paper d. D d) { w, t, w, t,.., w, t } dx( n n Where n s the number of dfferent terms n the document. Defne nverted ndex of papers as a functon that returns a set of dfferent papers, n whch gven term w s occurred and weghts of ths term n these papers: Idx( w) { d1, t1, d2, t2,.., dn, tn }. The algorthm of the fast smlar document search s the followng. 1. Get terms of the paper d from ts drect ndex D dx (d). 2. Retreve lst of papers contanng the terms got n step 1 from nverted ndex I dx (w). 3. Flter papers that are wealy ntersected by the terms of the nput paper. 4. Get lsts of terms for retreved papers from D dx (d). 5. Calculate Manhattan [2] dstance between the lsts of terms got n step 4. We chose ths dstance because t s fast to calculate and n n 153
3 provdes qualty comparable to more complcated metrcs le cosne dstance [16]. 6. Flter papers wth hgh dstance to nput paper accordng to predefned threshold H. In step 3 of ths algorthm we cut wealy smlar papers wthout calculatng a dstance measure that sgnfcantly mproves the algorthm. Steps 1-4 can be performed n parallel, whch mproves the performance. 3.2 Clusterng algorthm Suppose a descrptor of a cluster s a set of terms, relevant to ths cluster. Let an ntal core of a cluster s a subset of hghly smlar papers, whch s used for generaton of a descrptor. A descrptor s bult based on full-text of these papers. Let Core(d ) s a functon that returns a tuple m for gven paper d, where c s an dentfer for ntal core of cluster and m s a numerc characterstc of paper d n ntal core c. Characterstc m s used for decreasng of dstance threshold H durng buldng of ntal core, that prevents mergng of ntal cores. If paper d s not presented n any ntal core, the functon Core(d ) returns empty tuple. Drect ndex of descrptors s a functon that returns a set of dfferent terms and ther weghts from the descrptor of cluster CDdx( c) { w1, t1, w2, t2,.., wn, tn } where n s count of dfferent terms n the descrptor. Inverted ndex of descrptors s a functon that returns a set of descrptors n whch gven term w s occurred and weghts of ths term n these descrptors: CIdx( w) { c1, t1, c2, t2,.., cn, tn }. We use topc mportance measure I for weghtng terms of descrptors. Ths measure for term w from corpus D and cluster c s calculated as follows: ( w, df ( w, df ( w, Dc ) I( w, ( w, X( ( w, ), where Dc s a set of papers from cluster c and X () s a Heavsde step functon. Due to the topc mportance measure, descrptor of cluster conssts mostly of terms specfc for papers of the cluster. Let H a and are clusterng thresholds, affectng on nsularty of clusters. The proposed full-text clusterng method contans two parts. The frst part could be performed ndependently on dfferent nodes of computer networ. Ths part conventonally can be called ntal cores of clusters detecton. The steps are the followng. 1. Index the nput corpus D. 2. Whle D s not empty, exclude a paper d from D. If D s empty, turn to step 6. If Core(d ) returns empty tuple, then turn to step 3, else turn to step Generate new cluster core ndex c. Add new tuple 1 wth m 1 to Core. Then turn to step Get dentfer c for current ntal core and m for current paper. Reduce m m. 5. Search smlar papers by the fast algorthm wth threshold H H a m. Add t to Core wth current value of m. Then turn to step Buld descrptors CD dx (c) and CI dx (w) for each ntal core c. The second part s performed n a selected supervsor node and conssts of the followng steps. 1. Search smlar descrptors by the fast algorthm usng CDdx and CI dx. Then merge these descrptors. In practce ths step can be useful, f the frst secton executes on several computer nodes. 2. Classfy all papers. For each retreved descrptor use steps 2-6 from the fast smlar search algorthm and resultng CDdx as a lst of target terms. To prevent fuzzy clusterng each paper s labelled by dentfer of ts cluster and by value of ts smlarty to descrptor of ths cluster. Classfer uses these labels to nclude each paper n the most smlar cluster. The advantage of the proposed method s ts dstrbuted nature thereby full-text clusterng of large amounts of papers can be mplemented. That s necessary for detectng current research drectons represented n wde research area. Another beneft conssts n usng ndexes whch have ncremental structure, so addng a number of papers to the corpus D does not nvolve full re-clusterng. 4 Experment 4.1 Dataset descrpton For experment a dataset of papers of the research area Regeneratve medcne s created and verfed by experts wth PhD degree. The dataset contans 112 wellcted publcly accessble papers from 2000 to 2014 dstrbuted nto 4 research drectons. We apply wdely used lngustc analyss lbrary Freelng [11] to retreve normalzed terms (words and noun phrases wthout stop words) from text of papers n our study. These terms are ncluded n the dataset. The dataset s avalable on demand, and can be extended and used accordng to the BSD lcense. Descrptor Sze Average publcaton year Bran stroe Chmerc antgen receptor cell therapy Induced plurpotent stem cells Wound healng burns Experment setup 154
4 Descrptor Bran stroe Chmerc antgen receptor cell therapy Induced plurpotent stem cells Wound healng burns Terms stroe, brdu test, behavoral, endothelal cell, cell brdu, psa-ncam, cortcal cell, reactve, regeneraton neuronal, endogenous, neurogeness, endostatn, cortcal, expanson nonhematopoet schemc neuron pbls, receptor chmer cd19-spec, mmunty, anttumor, adoptve fn, malgnancy b-cell, aapcs, proten fuson, nfuson cell gene sgnature, photoreceptor, epthelum retnal, retnal, suppressor, tumor, cell ps, technology ps, retnal cell, ps-derved, pscs pgmented wound chron wound heal, eratnocyte, epdermal cell, fuseng eratn, epderms regon, We use the sem-automatc approach for tunng clusterng methods that can avod emprcal parameters where the parameter settng precson prorty over recall. In our experment, 1. We use a controlled Measure Affnty propagaton Brch LDA + Affnty propagaton LDA + Brch Proposed method Precson Recall F-measure setup. Clusterng parameters are calculated by maxmzaton of the qualty functon wth boundares 0 H a 1 and (for the proposed method). For other methods we set boundares accordng to constrants of these methods. We apply commonly used F-measure functon for classfcaton methods assessment. Ths functon based on precson and recall. Recall R s defned as the rato of the number of correctly classfed papers to sze of class n the tran set: t R, t c where t s a number of correctly classfed papers; c s a number of papers whch the method carred to other classes. Precson P s defned as the rato of number of correctly classfed papers to the total of classfed papers: t P, t where t s a number of correctly classfed papers; s a number of ncorrectly classfed papers. Then the f-measure calculates as follows: 2 ( 1) PR F, 2 P R random search method [7] for optmzaton of qualty functon. 4.3 Experment result Thus, we show that proposed method can be tuned properly usng a part of analyzed dataset. Results show that proposed method has suffcent qualty of clusterng (Table 2). Generated terms represent research drectons from the dataset. Table 3 shows experment results usng cross-valdaton technque. Both qualtatve and quanttatve assessment shows that the proposed method produces relatvely good results. In opposte, the affnty propagaton method returns unsatsfactory results. Obvously ths method s not sutable for the soluton of research drecton detecton problem. Possbly t relates to ncorrect exemplars selecton due to the mssng of the ntal dstrbuton for exemplars. We suggest that our method overcomes other examned methods due to applyng Manhattan dstance for paper smlarty estmaton, to usng of topc mportance measure for determnaton of terms of clusters and due to mplementaton of lngustc analyss for terms extracton as well. That leads to more precse estmaton of smlarty measure between papers and, consequently, to better qualty of research drectons detecton. 155
5 5 Concluson and future wor In ths study we nvestgate varous methods for detecton of current research drectons, whch are based on clusterng. We found that t s necessary to create a method that can be tuned successfully n a small labelled part of analyzed dataset. In ths paper we present the mproved full-text clusterng method for detecton of research drectons. Experment results show that proposed method s more applcable for research drectons detecton, than the other wdely used methods. The sem-automatc approach for tunng of clusterng method demonstrates ts performance also. Besdes, t was shown that fulltext clusterng methods wor mprecsely for thematcally heterogeneous research drectons. Ths paper s consdered as an ntal study that wll be extended n a further research. Moreover, the further development of proposed method conssts n mplementaton of hybrd metrcs for clusterng. These metrcs can be used not only texts smlarty, but also co-ctatons and co-authorshps of papers n dataset. It s necessary to contnue wor wth experts for extenson of our expermental dataset, because hybrd clusterng methods cannot process small datasets properly. References [1] Aletras N. and Stevenson M. Evaluatng topc coherence usng dstrbutonal semantcs // Proceedngs of the 10th Internatonal Conference on Computatonal Semantcs (IWCS 2013) Long Papers pp [2] Blac P. E. Manhattan dstance // Dctonary of Algorthms and Data Structures Vol. 18. [3] Ble D. M., Ng A. Y. and Jordan M. I. Latent drchlet allocaton // The Journal of machne Learnng research Vol. 3. pp [4] Cobo M. J. et al. Scence mappng software tools: Revew, analyss, and cooperatve study among tools. // Journal of the Amercan Socety for Informaton Scence and Technology Vol. 62(7). pp [5] Frey B. J. and Duec D. Clusterng by passng messages between data ponts. // Scence Vol. 315(5814). pp [6] Glanzel W. Bblometrc methods for detectng and analysng emergng research topcs. // El profesonal de la nformacon Vol. 21(2). pp [7] Kaelo P. and Al M. Some varants of the controlled random search algorthm for global optmzaton // J. Optm. Theory Appl Vol. 130(2). pp [8] Koltcov S., Koltsova O., Noleno S. Latent drchlet allocaton: stablty and applcatons to studes of user-generated content //Proceedngs of the 2014 ACM conference on Web scence. ACM, pp [9] Lee W. H. How to dentfy emergng research felds usng scentometrcs: An example n the feld of nformaton securty // Scentometrcs Vol. 76(3). pp [10] Leone M. et al. Clusterng by soft-constrant affnty propagaton: applcatons to geneexpresson data //Bonformatcs Vol. 23 (20). pp [11] Padró Lluís and Stanlovsy Evgeny. FreeLng 3.0: Towards Wder Multlngualty // Proceedngs of the Language Resources and Evaluaton Conference (LREC 2012). Istanbul, [12] Rousseeuw P. J. Slhouettes: a graphcal ad to the nterpretaton and valdaton of cluster analyss // Journal of computatonal and appled mathematcs Vol. 20. pp [13] Scence Ctaton Index by Thomson Reuters [14] Shvets A. et al. Proceedngs of the Scence and Informaton Conference // Detecton of Current Research Drectons Based on Full-Text Clusterng. London, [15] Sochenov I. Relatonal-stuatonal data structures, algorthms and methods for search and analytcal tass solvng [n Russan]. PhD thess. Insttute for systems analyss of RAS, Moscow, [16] Suvorov R. E. and Sochenov I. V. Method for detectng relatonshps between sc-tech documents based on topc mportance characterstc. [In Russan] ISA RAS, Moscow, Vol. 1. pp [17] Sten B. Prncples of hash-based text retreval //Proceedngs of the 30th annual nternatonal ACM SIGIR conference on Research and development n nformaton retreval. ACM, [18] Vorontsov K. and Potapeno A. Addtve regularzaton of topc models // Machne Learnng pp [19] Zhang T., Ramarshnan R., and Lvny M. BIRCH: an effcent data clusterng method for very large databases. // ACM SIGMOD Record. - ACM, Vol. 25(2). pp
The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis
The Development of Web Log Mnng Based on Improve-K-Means Clusterng Analyss TngZhong Wang * College of Informaton Technology, Luoyang Normal Unversty, Luoyang, 471022, Chna [email protected] Abstract.
What is Candidate Sampling
What s Canddate Samplng Say we have a multclass or mult label problem where each tranng example ( x, T ) conssts of a context x a small (mult)set of target classes T out of a large unverse L of possble
NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION
NEURO-FUZZY INFERENE SYSTEM FOR E-OMMERE WEBSITE EVALUATION Huan Lu, School of Software, Harbn Unversty of Scence and Technology, Harbn, hna Faculty of Appled Mathematcs and omputer Scence, Belarusan State
An Interest-Oriented Network Evolution Mechanism for Online Communities
An Interest-Orented Network Evoluton Mechansm for Onlne Communtes Cahong Sun and Xaopng Yang School of Informaton, Renmn Unversty of Chna, Bejng 100872, P.R. Chna {chsun,yang}@ruc.edu.cn Abstract. Onlne
Forecasting the Direction and Strength of Stock Market Movement
Forecastng the Drecton and Strength of Stock Market Movement Jngwe Chen Mng Chen Nan Ye [email protected] [email protected] [email protected] Abstract - Stock market s one of the most complcated systems
Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification
Lecture 4: More classfers and classes C4B Machne Learnng Hlary 20 A. Zsserman Logstc regresson Loss functons revsted Adaboost Loss functons revsted Optmzaton Multple class classfcaton Logstc Regresson
Conversion between the vector and raster data structures using Fuzzy Geographical Entities
Converson between the vector and raster data structures usng Fuzzy Geographcal Enttes Cdála Fonte Department of Mathematcs Faculty of Scences and Technology Unversty of Combra, Apartado 38, 3 454 Combra,
Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network
700 Proceedngs of the 8th Internatonal Conference on Innovaton & Management Forecastng the Demand of Emergency Supples: Based on the CBR Theory and BP Neural Network Fu Deqang, Lu Yun, L Changbng School
A Secure Password-Authenticated Key Agreement Using Smart Cards
A Secure Password-Authentcated Key Agreement Usng Smart Cards Ka Chan 1, Wen-Chung Kuo 2 and Jn-Chou Cheng 3 1 Department of Computer and Informaton Scence, R.O.C. Mltary Academy, Kaohsung 83059, Tawan,
WISE-Integrator: An Automatic Integrator of Web Search Interfaces for E-Commerce
WSE-ntegrator: An Automatc ntegrator of Web Search nterfaces for E-Commerce Ha He, Wey Meng Dept. of Computer Scence SUNY at Bnghamton Bnghamton, NY 13902 {hahe,meng}@cs.bnghamton.edu Clement Yu Dept.
Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College
Feature selecton for ntruson detecton Slobodan Petrovć NISlab, Gjøvk Unversty College Contents The feature selecton problem Intruson detecton Traffc features relevant for IDS The CFS measure The mrmr measure
How To Understand The Results Of The German Meris Cloud And Water Vapour Product
Ttel: Project: Doc. No.: MERIS level 3 cloud and water vapour products MAPP MAPP-ATBD-ClWVL3 Issue: 1 Revson: 0 Date: 9.12.1998 Functon Name Organsaton Sgnature Date Author: Bennartz FUB Preusker FUB Schüller
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES
FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES Zuzanna BRO EK-MUCHA, Grzegorz ZADORA, 2 Insttute of Forensc Research, Cracow, Poland 2 Faculty of Chemstry, Jagellonan
Document Clustering Analysis Based on Hybrid PSO+K-means Algorithm
Document Clusterng Analyss Based on Hybrd PSO+K-means Algorthm Xaohu Cu, Thomas E. Potok Appled Software Engneerng Research Group, Computatonal Scences and Engneerng Dvson, Oak Rdge Natonal Laboratory,
Project Networks With Mixed-Time Constraints
Project Networs Wth Mxed-Tme Constrants L Caccetta and B Wattananon Western Australan Centre of Excellence n Industral Optmsaton (WACEIO) Curtn Unversty of Technology GPO Box U1987 Perth Western Australa
Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic
Lagrange Multplers as Quanttatve Indcators n Economcs Ivan Mezník Insttute of Informatcs, Faculty of Busness and Management, Brno Unversty of TechnologCzech Republc Abstract The quanttatve role of Lagrange
A DATA MINING APPLICATION IN A STUDENT DATABASE
JOURNAL OF AERONAUTICS AND SPACE TECHNOLOGIES JULY 005 VOLUME NUMBER (53-57) A DATA MINING APPLICATION IN A STUDENT DATABASE Şenol Zafer ERDOĞAN Maltepe Ünversty Faculty of Engneerng Büyükbakkalköy-Istanbul
Design and Development of a Security Evaluation Platform Based on International Standards
Internatonal Journal of Informatcs Socety, VOL.5, NO.2 (203) 7-80 7 Desgn and Development of a Securty Evaluaton Platform Based on Internatonal Standards Yuj Takahash and Yoshm Teshgawara Graduate School
An Alternative Way to Measure Private Equity Performance
An Alternatve Way to Measure Prvate Equty Performance Peter Todd Parlux Investment Technology LLC Summary Internal Rate of Return (IRR) s probably the most common way to measure the performance of prvate
Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*
Probablstc Latent Semantc User Segmentaton for Behavoral Targeted Advertsng* Xaohu Wu 1,2, Jun Yan 2, Nng Lu 2, Shucheng Yan 3, Yng Chen 1, Zheng Chen 2 1 Department of Computer Scence Bejng Insttute of
Single and multiple stage classifiers implementing logistic discrimination
Sngle and multple stage classfers mplementng logstc dscrmnaton Hélo Radke Bttencourt 1 Dens Alter de Olvera Moraes 2 Vctor Haertel 2 1 Pontfíca Unversdade Católca do Ro Grande do Sul - PUCRS Av. Ipranga,
Calculation of Sampling Weights
Perre Foy Statstcs Canada 4 Calculaton of Samplng Weghts 4.1 OVERVIEW The basc sample desgn used n TIMSS Populatons 1 and 2 was a two-stage stratfed cluster desgn. 1 The frst stage conssted of a sample
CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements
Lecture 3 Densty estmaton Mlos Hauskrecht [email protected] 5329 Sennott Square Next lecture: Matlab tutoral Announcements Rules for attendng the class: Regstered for credt Regstered for audt (only f there
Multiple-Period Attribution: Residuals and Compounding
Multple-Perod Attrbuton: Resduals and Compoundng Our revewer gave these authors full marks for dealng wth an ssue that performance measurers and vendors often regard as propretary nformaton. In 1994, Dens
Design of Output Codes for Fast Covering Learning using Basic Decomposition Techniques
Journal of Computer Scence (7): 565-57, 6 ISSN 59-66 6 Scence Publcatons Desgn of Output Codes for Fast Coverng Learnng usng Basc Decomposton Technques Aruna Twar and Narendra S. Chaudhar, Faculty of Computer
Enterprise Master Patient Index
Enterprse Master Patent Index Healthcare data are captured n many dfferent settngs such as hosptals, clncs, labs, and physcan offces. Accordng to a report by the CDC, patents n the Unted States made an
A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing
A Replcaton-Based and Fault Tolerant Allocaton Algorthm for Cloud Computng Tork Altameem Dept of Computer Scence, RCC, Kng Saud Unversty, PO Box: 28095 11437 Ryadh-Saud Araba Abstract The very large nfrastructure
IMPACT ANALYSIS OF A CELLULAR PHONE
4 th ASA & μeta Internatonal Conference IMPACT AALYSIS OF A CELLULAR PHOE We Lu, 2 Hongy L Bejng FEAonlne Engneerng Co.,Ltd. Bejng, Chna ABSTRACT Drop test smulaton plays an mportant role n nvestgatng
A Performance Analysis of View Maintenance Techniques for Data Warehouses
A Performance Analyss of Vew Mantenance Technques for Data Warehouses Xng Wang Dell Computer Corporaton Round Roc, Texas Le Gruenwald The nversty of Olahoma School of Computer Scence orman, OK 739 Guangtao
Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 24, 819-840 (2008) Data Broadcast on a Mult-System Heterogeneous Overlayed Wreless Network * Department of Computer Scence Natonal Chao Tung Unversty Hsnchu,
Damage detection in composite laminates using coin-tap method
Damage detecton n composte lamnates usng con-tap method S.J. Km Korea Aerospace Research Insttute, 45 Eoeun-Dong, Youseong-Gu, 35-333 Daejeon, Republc of Korea [email protected] 45 The con-tap test has the
Using Content-Based Filtering for Recommendation 1
Usng Content-Based Flterng for Recommendaton 1 Robn van Meteren 1 and Maarten van Someren 2 1 NetlnQ Group, Gerard Brandtstraat 26-28, 1054 JK, Amsterdam, The Netherlands, [email protected] 2 Unversty of
Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)
Face Recognton Problem Face Verfcaton Problem Face Verfcaton (1:1 matchng) Querymage face query Face Recognton (1:N matchng) database Applcaton: Access Control www.vsage.com www.vsoncs.com Bometrc Authentcaton
Web Object Indexing Using Domain Knowledge *
Web Object Indexng Usng Doman Knowledge * Muyuan Wang Department of Automaton Tsnghua Unversty Bejng 100084, Chna (86-10)51774518 Zhwe L, Le Lu, We-Yng Ma Mcrosoft Research Asa Sgma Center, Hadan Dstrct
Semantic Link Analysis for Finding Answer Experts *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 51-65 (2012) Semantc Lnk Analyss for Fndng Answer Experts * YAO LU 1,2,3, XIAOJUN QUAN 2, JINGSHENG LEI 4, XINGLIANG NI 1,2,3, WENYIN LIU 2,3 AND YINLONG
Lecture 2: Single Layer Perceptrons Kevin Swingler
Lecture 2: Sngle Layer Perceptrons Kevn Sngler [email protected] Recap: McCulloch-Ptts Neuron Ths vastly smplfed model of real neurons s also knon as a Threshold Logc Unt: W 2 A Y 3 n W n. A set of synapses
320 The Internatonal Arab Journal of Informaton Technology, Vol. 5, No. 3, July 2008 Comparsons Between Data Clusterng Algorthms Osama Abu Abbas Computer Scence Department, Yarmouk Unversty, Jordan Abstract:
BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK. 0688, [email protected]
Proceedngs of the 41st Internatonal Conference on Computers & Industral Engneerng BUSINESS PROCESS PERFORMANCE MANAGEMENT USING BAYESIAN BELIEF NETWORK Yeong-bn Mn 1, Yongwoo Shn 2, Km Jeehong 1, Dongsoo
Gender Classification for Real-Time Audience Analysis System
Gender Classfcaton for Real-Tme Audence Analyss System Vladmr Khryashchev, Lev Shmaglt, Andrey Shemyakov, Anton Lebedev Yaroslavl State Unversty Yaroslavl, Russa [email protected], [email protected], [email protected],
Traffic State Estimation in the Traffic Management Center of Berlin
Traffc State Estmaton n the Traffc Management Center of Berln Authors: Peter Vortsch, PTV AG, Stumpfstrasse, D-763 Karlsruhe, Germany phone ++49/72/965/35, emal [email protected] Peter Möhl, PTV AG,
J. Parallel Distrib. Comput.
J. Parallel Dstrb. Comput. 71 (2011) 62 76 Contents lsts avalable at ScenceDrect J. Parallel Dstrb. Comput. journal homepage: www.elsever.com/locate/jpdc Optmzng server placement n dstrbuted systems n
Overview of monitoring and evaluation
540 Toolkt to Combat Traffckng n Persons Tool 10.1 Overvew of montorng and evaluaton Overvew Ths tool brefly descrbes both montorng and evaluaton, and the dstncton between the two. What s montorng? Montorng
Can Auto Liability Insurance Purchases Signal Risk Attitude?
Internatonal Journal of Busness and Economcs, 2011, Vol. 10, No. 2, 159-164 Can Auto Lablty Insurance Purchases Sgnal Rsk Atttude? Chu-Shu L Department of Internatonal Busness, Asa Unversty, Tawan Sheng-Chang
Stochastic Protocol Modeling for Anomaly Based Network Intrusion Detection
Stochastc Protocol Modelng for Anomaly Based Network Intruson Detecton Juan M. Estevez-Tapador, Pedro Garca-Teodoro, and Jesus E. Daz-Verdejo Department of Electroncs and Computer Technology Unversty of
Fast Fuzzy Clustering of Web Page Collections
Fast Fuzzy Clusterng of Web Page Collectons Chrstan Borgelt and Andreas Nürnberger Dept. of Knowledge Processng and Language Engneerng Otto-von-Guercke-Unversty of Magdeburg Unverstätsplatz, D-396 Magdeburg,
How To Classfy Onlne Mesh Network Traffc Classfcaton And Onlna Wreless Mesh Network Traffic Onlnge Network
Journal of Computatonal Informaton Systems 7:5 (2011) 1524-1532 Avalable at http://www.jofcs.com Onlne Wreless Mesh Network Traffc Classfcaton usng Machne Learnng Chengje GU 1,, Shuny ZHANG 1, Xaozhen
An Enhanced Super-Resolution System with Improved Image Registration, Automatic Image Selection, and Image Enhancement
An Enhanced Super-Resoluton System wth Improved Image Regstraton, Automatc Image Selecton, and Image Enhancement Yu-Chuan Kuo ( ), Chen-Yu Chen ( ), and Chou-Shann Fuh ( ) Department of Computer Scence
Research on Evaluation of Customer Experience of B2C Ecommerce Logistics Enterprises
3rd Internatonal Conference on Educaton, Management, Arts, Economcs and Socal Scence (ICEMAESS 2015) Research on Evaluaton of Customer Experence of B2C Ecommerce Logstcs Enterprses Yle Pe1, a, Wanxn Xue1,
Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module LOSSLESS IMAGE COMPRESSION SYSTEMS Lesson 3 Lossless Compresson: Huffman Codng Instructonal Objectves At the end of ths lesson, the students should be able to:. Defne and measure source entropy..
How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S
S C H E D A E I N F O R M A T I C A E VOLUME 0 0 On Mean Squared Error of Herarchcal Estmator Stans law Brodowsk Faculty of Physcs, Astronomy, and Appled Computer Scence, Jagellonan Unversty, Reymonta
Mining Multiple Large Data Sources
The Internatonal Arab Journal of Informaton Technology, Vol. 7, No. 3, July 2 24 Mnng Multple Large Data Sources Anmesh Adhkar, Pralhad Ramachandrarao 2, Bhanu Prasad 3, and Jhml Adhkar 4 Department of
A Dynamic Energy-Efficiency Mechanism for Data Center Networks
A Dynamc Energy-Effcency Mechansm for Data Center Networks Sun Lang, Zhang Jnfang, Huang Daochao, Yang Dong, Qn Yajuan A Dynamc Energy-Effcency Mechansm for Data Center Networks 1 Sun Lang, 1 Zhang Jnfang,
A novel Method for Data Mining and Classification based on
A novel Method for Data Mnng and Classfcaton based on Ensemble Learnng 1 1, Frst Author Nejang Normal Unversty;Schuan Nejang 641112,Chna, E-mal: [email protected] Abstract Data mnng has been attached great
Statistical Methods to Develop Rating Models
Statstcal Methods to Develop Ratng Models [Evelyn Hayden and Danel Porath, Österrechsche Natonalbank and Unversty of Appled Scences at Manz] Source: The Basel II Rsk Parameters Estmaton, Valdaton, and
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).
REVIEW OF RISK MANAGEMENT CONCEPTS LOSS DISTRIBUTIONS AND INSURANCE Loss and nsurance: When someone s subject to the rsk of ncurrng a fnancal loss, the loss s generally modeled usng a random varable or
Calculating the high frequency transmission line parameters of power cables
< ' Calculatng the hgh frequency transmsson lne parameters of power cables Authors: Dr. John Dcknson, Laboratory Servces Manager, N 0 RW E B Communcatons Mr. Peter J. Ncholson, Project Assgnment Manager,
Cloud-based Social Application Deployment using Local Processing and Global Distribution
Cloud-based Socal Applcaton Deployment usng Local Processng and Global Dstrbuton Zh Wang *, Baochun L, Lfeng Sun *, and Shqang Yang * * Bejng Key Laboratory of Networked Multmeda Department of Computer
On the Optimal Control of a Cascade of Hydro-Electric Power Stations
On the Optmal Control of a Cascade of Hydro-Electrc Power Statons M.C.M. Guedes a, A.F. Rbero a, G.V. Smrnov b and S. Vlela c a Department of Mathematcs, School of Scences, Unversty of Porto, Portugal;
Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters
Frequency Selectve IQ Phase and IQ Ampltude Imbalance Adjustments for OFDM Drect Converson ransmtters Edmund Coersmeer, Ernst Zelnsk Noka, Meesmannstrasse 103, 44807 Bochum, Germany [email protected],
Realistic Image Synthesis
Realstc Image Synthess - Combned Samplng and Path Tracng - Phlpp Slusallek Karol Myszkowsk Vncent Pegoraro Overvew: Today Combned Samplng (Multple Importance Samplng) Renderng and Measurng Equaton Random
Dynamic Fuzzy Pattern Recognition
Dynamc Fuzzy Pattern Recognton Von der Fakultät für Wrtschaftswssenschaften der Rhensch-Westfälschen Technschen Hochschule Aachen zur Erlangung des akademschen Grades enes Doktors der Wrtschafts- und Sozalwssenschaften
THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION
Internatonal Journal of Electronc Busness Management, Vol. 3, No. 4, pp. 30-30 (2005) 30 THE APPLICATION OF DATA MINING TECHNIQUES AND MULTIPLE CLASSIFIERS TO MARKETING DECISION Yu-Mn Chang *, Yu-Cheh
Searching for Interacting Features for Spam Filtering
Searchng for Interactng Features for Spam Flterng Chuanlang Chen 1, Yun-Chao Gong 2, Rongfang Be 1,, and X. Z. Gao 3 1 Department of Computer Scence, Bejng Normal Unversty, Bejng 100875, Chna 2 Software
ERP Software Selection Using The Rough Set And TPOSIS Methods
ERP Software Selecton Usng The Rough Set And TPOSIS Methods Under Fuzzy Envronment Informaton Management Department, Hunan Unversty of Fnance and Economcs, No. 139, Fengln 2nd Road, Changsha, 410205, Chna
The Journal of Systems and Software
The Journal of Systems and Software 82 (2009) 241 252 Contents lsts avalable at ScenceDrect The Journal of Systems and Software journal homepage: www. elsever. com/ locate/ jss A study of project selecton
Automated information technology for ionosphere monitoring of low-orbit navigation satellite signals
Automated nformaton technology for onosphere montorng of low-orbt navgaton satellte sgnals Alexander Romanov, Sergey Trusov and Alexey Romanov Federal State Untary Enterprse Russan Insttute of Space Devce
SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:
SPEE Recommended Evaluaton Practce #6 efnton of eclne Curve Parameters Background: The producton hstores of ol and gas wells can be analyzed to estmate reserves and future ol and gas producton rates and
Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications
CMSC828G Prncples of Data Mnng Lecture #9 Today s Readng: HMS, chapter 9 Today s Lecture: Descrptve Modelng Clusterng Algorthms Descrptve Models model presents the man features of the data, a global summary
Demographic and Health Surveys Methodology
samplng and household lstng manual Demographc and Health Surveys Methodology Ths document s part of the Demographc and Health Survey s DHS Toolkt of methodology for the MEASURE DHS Phase III project, mplemented
Automated Network Performance Management and Monitoring via One-class Support Vector Machine
Automated Network Performance Management and Montorng va One-class Support Vector Machne R. Zhang, J. Jang, and S. Zhang Dgtal Meda & Systems Research Insttute, Unversty of Bradford, UK Abstract: In ths
Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy
Fnancal Tme Seres Analyss Patrck McSharry [email protected] www.mcsharry.net Trnty Term 2014 Mathematcal Insttute Unversty of Oxford Course outlne 1. Data analyss, probablty, correlatons, vsualsaton
Fuzzy TOPSIS Method in the Selection of Investment Boards by Incorporating Operational Risks
, July 6-8, 2011, London, U.K. Fuzzy TOPSIS Method n the Selecton of Investment Boards by Incorporatng Operatonal Rsks Elssa Nada Mad, and Abu Osman Md Tap Abstract Mult Crtera Decson Makng (MCDM) nvolves
1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)
6.3 / -- Communcaton Networks II (Görg) SS20 -- www.comnets.un-bremen.de Communcaton Networks II Contents. Fundamentals of probablty theory 2. Emergence of communcaton traffc 3. Stochastc & Markovan Processes
A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm
Avalable onlne www.ocpr.com Journal of Chemcal and Pharmaceutcal Research, 2014, 6(7):1884-1889 Research Artcle ISSN : 0975-7384 CODEN(USA) : JCPRC5 A hybrd global optmzaton algorthm based on parallel
Construction Rules for Morningstar Canada Target Dividend Index SM
Constructon Rules for Mornngstar Canada Target Dvdend Index SM Mornngstar Methodology Paper October 2014 Verson 1.2 2014 Mornngstar, Inc. All rghts reserved. The nformaton n ths document s the property
Improved SVM in Cloud Computing Information Mining
Internatonal Journal of Grd Dstrbuton Computng Vol.8, No.1 (015), pp.33-40 http://dx.do.org/10.1457/jgdc.015.8.1.04 Improved n Cloud Computng Informaton Mnng Lvshuhong (ZhengDe polytechnc college JangSu
Selecting Best Employee of the Year Using Analytical Hierarchy Process
J. Basc. Appl. Sc. Res., 5(11)72-76, 2015 2015, TextRoad Publcaton ISSN 2090-4304 Journal of Basc and Appled Scentfc Research www.textroad.com Selectng Best Employee of the Year Usng Analytcal Herarchy
Traffic-light a stress test for life insurance provisions
MEMORANDUM Date 006-09-7 Authors Bengt von Bahr, Göran Ronge Traffc-lght a stress test for lfe nsurance provsons Fnansnspetonen P.O. Box 6750 SE-113 85 Stocholm [Sveavägen 167] Tel +46 8 787 80 00 Fax
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol
CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK Sample Stablty Protocol Background The Cholesterol Reference Method Laboratory Network (CRMLN) developed certfcaton protocols for total cholesterol, HDL
Risk Model of Long-Term Production Scheduling in Open Pit Gold Mining
Rsk Model of Long-Term Producton Schedulng n Open Pt Gold Mnng R Halatchev 1 and P Lever 2 ABSTRACT Open pt gold mnng s an mportant sector of the Australan mnng ndustry. It uses large amounts of nvestments,
Open Access A Load Balancing Strategy with Bandwidth Constraint in Cloud Computing. Jing Deng 1,*, Ping Guo 2, Qi Li 3, Haizhu Chen 1
Send Orders for Reprnts to [email protected] The Open Cybernetcs & Systemcs Journal, 2014, 8, 115-121 115 Open Access A Load Balancng Strategy wth Bandwdth Constrant n Cloud Computng Jng Deng 1,*,
A study on the ability of Support Vector Regression and Neural Networks to Forecast Basic Time Series Patterns
A study on the ablty of Support Vector Regresson and Neural Networks to Forecast Basc Tme Seres Patterns Sven F. Crone, Jose Guajardo 2, and Rchard Weber 2 Lancaster Unversty, Department of Management
Vehicle Detection and Tracking in Video from Moving Airborne Platform
Journal of Computatonal Informaton Systems 10: 12 (2014) 4965 4972 Avalable at http://www.jofcs.com Vehcle Detecton and Trackng n Vdeo from Movng Arborne Platform Lye ZHANG 1,2,, Hua WANG 3, L LI 2 1 School
Brigid Mullany, Ph.D University of North Carolina, Charlotte
Evaluaton And Comparson Of The Dfferent Standards Used To Defne The Postonal Accuracy And Repeatablty Of Numercally Controlled Machnng Center Axes Brgd Mullany, Ph.D Unversty of North Carolna, Charlotte
Multi-sensor Data Fusion for Cyber Security Situation Awareness
Avalable onlne at www.scencedrect.com Proceda Envronmental Scences 0 (20 ) 029 034 20 3rd Internatonal Conference on Envronmental 3rd Internatonal Conference on Envronmental Scence and Informaton Applcaton
Sensor placement for leak detection and location in water distribution networks
Sensor placement for leak detecton and locaton n water dstrbuton networks ABSTRACT R. Sarrate*, J. Blesa, F. Near, J. Quevedo Automatc Control Department, Unverstat Poltècnca de Catalunya, Rambla de Sant
Survival analysis methods in Insurance Applications in car insurance contracts
Survval analyss methods n Insurance Applcatons n car nsurance contracts Abder OULIDI 1 Jean-Mare MARION 2 Hervé GANACHAUD 3 Abstract In ths wor, we are nterested n survval models and ther applcatons on
