Managing the Knowledge Contained in Electronic Documents: a Clustering Method for Text Mining

Size: px
Start display at page:

Download "Managing the Knowledge Contained in Electronic Documents: a Clustering Method for Text Mining"

Transcription

1 Managing the Knowledge Contained in Electronic Documents: a Clustering Method for Text Mining S. Iiritano Getronics S.p.A. Rende (CS), Italy s_iiritano@yahoo.it M. Ruffolo Intersiel S.p.A. Via G. Rossini, Rende (CS), Italy m.ruffolo@telcal.it Abstract The huge amount of unstructured data available on the Web and the intranets creates today an information overloading problem. So, managing the knowledge contained in the textual documents is an important problem of Knowledge Management. Knowledge Extraction from collections of data is possible by Knowledge Discovery in Database (KDD), an interactive and iterative process focused on the exploration of data to discover new and interesting patterns within them. The fundamental phase of KDD process is Data Mining if data are in structured form and Text Mining when they are unstructured. This paper describes a prototype of a vertical corporate portal that implements a KDD process for knowledge extraction from unstructured data contained in textual documents. Text mining is realized through a clustering method that produces a partition of a set of documents on the basis of their contents characterized through the frequency of the words. 1. Introduction Using Knowledge Discovery in Database (KDD), where the fundamental step is Data Mining, knowledge workers can obtain important strategic information for their business. KDD has deeply transformed the methods to interrogate traditional databases, where data are in structured form, by automatically finding new and unknown patterns in huge quantity of data. However, structured data represent only a little part of the overall organization knowledge; in fact the major part of this knowledge is incorporated in textual documents. The amount of unstructured information in this form, accessible through the web, the intranets, the news groups etc. is enormously increased in last years. In this scenario the development of techniques and instruments of Knowledge Extraction, that are able to manage the knowledge contained in electronic textual documents, is a necessary task. This is possible through a KDD process based on Text Mining. A particular Text Mining approach is based on clustering techniques used to group documents according to their content. In this case the Knowledge extraction process is represented by the recognition of these groups. In this paper we present, in Section, a prototype of a vertical corporate portal that implements a KDD process as described above, in which documents are grouped together on the basis of word frequencies. Section 4 contains the results of two experiments carried out on a test corpus composed by articles extracted from some American newspapers and web publications, evaluated using measures defined in Section 3.. A Vertical Corporate Portal for Clustering Textual Documents Clustering methods are techniques for partitioning a set of objects in non-overlapped groups (clusters) on the base of suitable similarity measures. In the literature numerous clustering algorithms can be found [6] as well as a wide variety of similarity coefficients [7]. These techniques can be used in a KDD process to extract knowledge contained in (textual) documents as shown in Fig. 1. This work is supported by Laboratorio per l Innovazione dell Azione Progettuale Ricerca del Piano Telematico Calabria

2 D o c u m e n t A c q u i s i t i o n D o c u m e n t S o u r c e s D o c u m e n t P r e - P r o c e s s i n g R e p o s i t o r y D o c u m e n t C o r p u s T e x t M i n i n g S t r u c t u r e d D o c u m e n t s R e s u l t I n t e r p r e t a t i o n a n d R e f i n e m e n t C l u s t e r i n g K n o w l e d g e E x t r a c t i o n R e s u l t s Figure 1. The KDD process for unstructured data The KDD process is composed by four phases: Document Acquisition Phase: a collection of documents coming from various sources (Internet, company intranet, , etc.) is stored in a repository; Document Pre-processing: documents are submitted to a linguistic pre-processing based on term filtering and context analysis, then an internal representation based on word frequencies is produced; Text Mining: documents are partitioned in clusters; Results Interpretation and Refinement: clusters are submitted to the interpretation and refinement of a human operator. To implement this KDD process we developed a prototype of a Vertical Corporate Portal (VCP), composed by six modules that interact as shown in Fig.. I n t r a n e t VCP is designed to acquire documents internally or externally to the organization. Customer comments and communications, , news groups, manuals and program documentations, trade publications, internal search reports, know-how documents that are resident in the intranets can be submitted to the repository through the Document Collector that point to intranet site or directory to recognize them. Web sites containing information on competitors, market, products, technologies etc. can be acquired with the Crawler, an automatic agent that explores, periodically, selected web sites.. Document Pre-Processing The aim of this phase is to produce, for each input document, an internal representation suitable for text mining phase. The input is a set of m documents, and the output is a set of structured documents, one for each input document. A Document is a sequence of n words (n is the length of the document) and its vocabulary is the set of words V =w 1,,w λ occurring in ; the structured version of, denoted as D, is a set of pairs (w j,f D(w j)) where, for each j=1,,λ, f D(w j) represents the (relative) frequency of the word w j V in the document (i.e. the number of times that w j occurs in normalised w. r. t. the document length). In the VCP architecture the pre-processing phase is carried out by RM in three steps as shown in Fig. 3. W E B D o c u m e n t C o l l e c t o r C r a w l e r R e p o s i t o r y M a n a g e r R e p o s i t o r y C o r p u s S e l e c t o r C o r p u s T e x t C l a s s i f i e r A c q u i s i t i o n P r e - p r o c e s s i n g T e x t M i n i n g C o r p u s P a r t i t i o n Figure. Architecture of the VCP P a r t i t i o n A n a l y z e r R e s u l t s I n t e r p r e t a t i o n a n d R e f i n e m e n t The Crawler and the Document Collector allow the document acquisition phase; the Repository Manager (RM) provides to document pre-processing phase; the Corpus Selector (CS) and Text Classifier (TC) realize the text mining phase; Partition Analyser (PA) makes possible to interpret and to refine the results of the document clustering..1 Document Acquisition D o c u m e n t F i l t e r i n g C o n t e x t A n a l y s i s S t r u c t u r i n g Figure 3. Document pre-processing phase S t r u c t u r e d D o c u m e n t In particular: Filtering: discards from each document the additional words (articles, subjects, pronouns, prepositions) that are not interesting for the analysis; further, the remaining words are reduced to their stems excluding suffixes, prefixes, conjugations of the verbs, plurals. The result of this step is a Filtered document; Normalization: given a filtered document a context analysis is performed by which a synonymous is assigned to each word. The result of this step is a Normalized document. Context analysis is performed, only for English documents, using WordNet [18], developed at Princeton University.

3 WordNet is a lexical database that is able to individualize all meanings (senses) of a given English noun, adjective, verb or adverb finding its polysemies, antonyms, synonymies. The synonymous attributed to the term is the most close sense obtained considering the words in its around (context). However, to allow, also, the treatment of documents written in other languages, VCP is implemented so that context analysis can be excluded; Structuring: given a normalised document a Structured document is produced. At the end of this step RM updates two index files: Doc-Index which structure is (Cod_Doc, Reference, Number of words) and Word-Index which structure is (Word, Cod_Doc, Synonym, Synonym Frequency)..3 Text Mining This phase is carried out in two steps, Corpus Selection and Clustering. Corpus Selection. We call Corpus a set of structured documents. It can be formed by all documents in the repository or by a sub set of them, selected through a query. This step is performed by the Corpus Selector using information retrieval techniques on the files produced by RM. Clustering. The input to this phase is a corpus Ω and the output is a partition P = Γ 1,,Γ k of Ω where each Γ i term is called cluster. A cluster is a set of similar documents and, so, it s the sequence of h (cluster length) words occurring in all documents contained in it. As well as for documents the cluster vocabulary V Γ = w 1,,w η is the set of words occurring in Γ and the structured version C is the set of pairs (w j,f C(w j)), where, for each j=1,,η, f C(w j) represents the (relative) frequency of the word w j V Γ in the cluster Γ (i.e. the number of times w j occurs in Γ normalised w. r. t. the cluster length). P is obtained through a clustering technique in which the similarity coefficient is evaluated on the basis of the structured representation of clusters and documents, considering the frequency of the words included both in the document vocabulary V and in the cluster vocabulary V Γ. In the following we ll describe in detail the similarity measure and the clustering algorithm implemented in VCP..3.1 Similarity Measure. Let W = V V Γ = w 1,,w µ be the set of words present both in the document vocabulary V and in the cluster vocabulary V Γ. The similarity between the document D and the cluster C is measured as: µ ( Θ) S = Φ 1 (1) µ fd ) + fc ) Where measures the k = 1 k = 1 Φ = degree of overlapping of document vocabulary and cluster µ fd ) fc ) vocabulary, andθ = measures k = 1 µ the dissimilarity between common part of the document vocabulary and the cluster vocabulary. Note that S [0,1]. In fact: if V V Γ and f D(w k) = f C(w k) for any w k (k=1,,µ), then Φ=1, Θ=0 and S=1; if V V Γ =, then Φ=0 and S=0..3. Clustering Algorithm. The clustering method is illustrated through the following algorithm, written in a C- like code, based on concepts defined above.

4 Input: A structured corpus Ω Output: A partition P of Ω Initialization: Extracts a document D from Ω; Create a new cluster C 1 containing document D; P = C 1; Iteration: while (Ω ) do extracts document D i from Ω; extracts cluster C 1 from P; maxsimilarity=calculate_similarity(d i, C 1); //CL is a temporary cluster list used during work CL=P- C 1 ; while (CL )do extracts cluster C j from cluster list CL; maxsimilarity=maxmaxsimilarity, Calculate_Similarity(D i,c j); if (maxsimilarity < α) create a new cluster C that contains document D i; P = P C; Re_Control_Clusters(); else j = index of cluster for which maxsimilarity=calculate_similarity(d i, C j) C j = C j D i; In the algorithm are used the following functions: Re_Control_Clusters(). It is performed only when a new cluster is created. In this case all documents already assigned to the other clusters are re-controlled, and for each of them the similarity in comparison to the last produced cluster is determined. If it is greater than those in comparison to the cluster in which the document was assigned, the document is moved in the new cluster; Calculate_Similarity(D i, C j). It receives in input a structured document D i and a structured cluster C j and determines their similarity. The threshold value α was experimentally determined as Result Interpretation and Refinement This phase is realized by PA that shows for each cluster: the cluster length; the number of contained documents; a list of hyperlinks to these documents; a list of most representative words in the cluster ordered by frequency. PA, moreover, guides the user to explore more deeply the clusters and the documents contained for interpreting and refining the text mining results. 3. Measures of Performance: Precision, Coverage, F-Measure Performance evaluation of a clustering method is realized comparing the obtained (real) partition with the ideal one, manually recognized by a human operator that split documents in clusters on the basis of the homogeneity of their contents. In this Section we present some general formulas for performance evaluation of all clustering method. We consider measures referred to single clusters (comparative precision and comparative recall) and measures referred to the whole partition (total precision, total recall, F-Measure). 3.1 Comparative Precision and Comparative Recall Let P = Γ 1,, Γ be an ideal and P = Γ 1,, Γ ρ be a real partition of a corpus Ω. Using the symbol to denote the cardinality of a set, we can measure the comparative precision of the real cluster Γ j w. r. t. the ideal cluster Γ i as: Γ Γ Γ ' j i p ij =. () ' j The comparative recall of the real cluster Γ ' j w.r.t. the ideal cluster Γ i is evaluated as: r ij ' Γ j Γi =. (3) Γ i Comparative precision and comparative recall are used to evaluate total precision and total recall for the whole partition, comparing all real clusters with all ideal ones. 3. Total Precision and Total Recall With total precision and total recall we can evaluate the goodness of obtained partition analysing the composition of the real clusters most close to those ideal.

5 Total precision is measured as: P max( max p ij = 1,..., ρ = 1 = i j Total recall is measured as: R max ρ, ) = ρ = = 1 j 1,..., i r Maximum value of P and R is 1, whereas the minimum value depend of the distribution of objects in the real partition. 3.3 F-Measure The F-Misure [] is a standard metric that combines total precision and total recall into a number that represent the overall performance measure of the clustering method. It is equal to: F ( 1+ β ) = β ij PR P + R So much more F tends to 1, and so much good is the classification. The value of the parameter β establishes the relative importance of the recall in comparison to the precision. The importance of the recall is direcly proportional to the value assumed for β. Tipically performances are evaluated with different values of β; in our experiments we have assumed four values for this parameter: β=1.0 (P and R have the same importance); β=0.5 (R is half important ρ than P); β= (R has a double weight than P); β = (relative importance of R and P depends of the number of obtained clusters). 04. Experimental Results In this section we show results of an experiment carried out on a test corpus composed by 146 documents that represent articles extracted from the principal American newspapers (Boston Globe, Baltimore Sun, Chicago Tribune, Dallas Morning News, Herlad Tribune, Los (4) (5) (6) Angeles Times, New York Times, Washington Post, New York Post, USA Today), and publications on various themes (astronomy, electricity, economy, aerodynamics, etc.) published on internet sites. For this test set the ideal partition is formed by 0 clusters. The experiments was carried out with context analysis (AN-1) and without it (AN-). In Table 1 for each of experiment is shown the number of obtained clusters, the precision P, the recall R, and the F-measure for different values of β. Test AN- 1 AN- N of cluster P R F-Measure β=1 β=0.5 β= β=ρ/ 0,6 7 0,8 4 0,7 4 0,7 0,8 0,75 4 0,6 0,7 0,7 0,68 0,73 0, Table 1. Experimental results As expected, if the context analysis is performed we have better results. 5. Conclusion In this work we shown that knowledge extraction from unstructured data contained in textual documents is possible with a clustering approach, and that the implementation of a web Portal for described KDD process allows to deal with the information overloading problem. The context analysis step and the classification step are realized with heuristics and can be re-designed to improve performances of VCP, as well as it s possible to extend the text mining phase integrating different techniques. Measures proposed in Section 3 are general, and can be used for evaluate performance for all clustering techniques. References [1] Text Mining and the Knowledge Management Space Version, SEMIO Corporation, 1998, California. [] M. Lenz, Managing the Knowledge Contained in Technical Documents, Proc. Of the Second Int. Conf. On Practical Aspects of Knowledge Management (PAKM98), Basel, Switzerland, 9-30 Oct

6 [3] R. Feldman and Al. Knowledge Management: a Text Mining Approach, Proc. Of the Second Int. Conf. On Practical Aspects of Knowledge Management (PAKM98), Basel, Switzerland, 9-30 Oct [4] C. E. Shannon and W. Weaver, La Teoria Matematica delle Comunicazioni, Etas Kompass, [5] B. Everitt, Cluster Analysis, Sage Publication Inc., Beverly Hills, [6] J. A. Hartigan, Clustering Algorithms, John Wiley and Sons, USA, [7] L. Kaufman, P. J. Rousseeuw, Finding Groups in Data, John Wiley and Sons, USA, [8] Doerre, Gersl, Seiffert, Text Mining Finding Nuggets in Mountains of Textual Data, KDD99 proceedings ACM, [9] L. Fahey, Competitors, John Wiley and Sons, USA, [10] C. J. Van Risbergen, W. B. Groft, Documents Clustering: an Evaluation of Some Experiments with the Cranfield Collection, Information Processing and Management, 1975, pp [11] A. Griffiths, H.C. Luckhurst, P. Willet, Using interdocument Similarity Information in Document Retrieval System, Journal of the American Society for Information Science, 1986, vol. 37 pp [1] B. S. Duran, P.L. Odell, Cluster Analysis: a Survey - Springer-Verlag, Berlin, [13] W. J. Frawley, G. Piatesky-Shapiro, C. Matheus, Knowledge Discovery in Databases: an Overview, AI Magazine, 199, pp [14] T. H. Davenport, L. Prusak, Working Knowledge, Boston Harvard Business School Press, [15] W. Eckerson, Analyst Insight Business Portal, June [16] P. D. Henig, Vertical Portals Aim for World Domination, Red Herring Online. [17] D. Gilmore, Some timely guidelines for web design, Mercury News Technology. [18] WordNet: An Electronic Lexical Database, MIT Press. [19] M. Davidson, The Transformation of Management, Butterworth-Heinemann, [0] I. Nonaka, A Dynamic Theory of Organizational Knowledge Creation, Organizational Science, February 1994, Vol. 5 n 1. [1] J. Duncan Davison, Java Servlet API Specification ver..1, Public Review Draft, Sun Microsystem, October [] E. Riloff and W. Lehnert, Information Extraction as a Basis for High-Precision Text Classification, ACM Transaction on Information System, July 1994, vol. 1, No. 3, pp

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words

Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words , pp.290-295 http://dx.doi.org/10.14257/astl.2015.111.55 Efficient Techniques for Improved Data Classification and POS Tagging by Monitoring Extraction, Pruning and Updating of Unknown Foreign Words Irfan

More information

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework

Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Effective Data Retrieval Mechanism Using AML within the Web Based Join Framework Usha Nandini D 1, Anish Gracias J 2 1 ushaduraisamy@yahoo.co.in 2 anishgracias@gmail.com Abstract A vast amount of assorted

More information

Clustering Technique in Data Mining for Text Documents

Clustering Technique in Data Mining for Text Documents Clustering Technique in Data Mining for Text Documents Ms.J.Sathya Priya Assistant Professor Dept Of Information Technology. Velammal Engineering College. Chennai. Ms.S.Priyadharshini Assistant Professor

More information

Search and Information Retrieval

Search and Information Retrieval Search and Information Retrieval Search on the Web 1 is a daily activity for many people throughout the world Search and communication are most popular uses of the computer Applications involving search

More information

Interactive Dynamic Information Extraction

Interactive Dynamic Information Extraction Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken

More information

HELSINKI UNIVERSITY OF TECHNOLOGY 26.1.2005 T-86.141 Enterprise Systems Integration, 2001. Data warehousing and Data mining: an Introduction

HELSINKI UNIVERSITY OF TECHNOLOGY 26.1.2005 T-86.141 Enterprise Systems Integration, 2001. Data warehousing and Data mining: an Introduction HELSINKI UNIVERSITY OF TECHNOLOGY 26.1.2005 T-86.141 Enterprise Systems Integration, 2001. Data warehousing and Data mining: an Introduction Federico Facca, Alessandro Gallo, federico@grafedi.it sciack@virgilio.it

More information

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Data Mining Project Report. Document Clustering. Meryem Uzun-Per Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...

More information

Universal. Event. Product. Computer. 1 warehouse.

Universal. Event. Product. Computer. 1 warehouse. Dynamic multi-dimensional models for text warehouses Maria Zamr Bleyberg, Karthik Ganesh Computing and Information Sciences Department Kansas State University, Manhattan, KS, 66506 Abstract In this paper,

More information

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

Data Mining in Web Search Engine Optimization and User Assisted Rank Results Data Mining in Web Search Engine Optimization and User Assisted Rank Results Minky Jindal Institute of Technology and Management Gurgaon 122017, Haryana, India Nisha kharb Institute of Technology and Management

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

F. Aiolli - Sistemi Informativi 2007/2008

F. Aiolli - Sistemi Informativi 2007/2008 Text Categorization Text categorization (TC - aka text classification) is the task of buiding text classifiers, i.e. sofware systems that classify documents from a domain D into a given, fixed set C =

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Projektgruppe. Categorization of text documents via classification

Projektgruppe. Categorization of text documents via classification Projektgruppe Steffen Beringer Categorization of text documents via classification 4. Juni 2010 Content Motivation Text categorization Classification in the machine learning Document indexing Construction

More information

Load Balancing in Structured Peer to Peer Systems

Load Balancing in Structured Peer to Peer Systems Load Balancing in Structured Peer to Peer Systems DR.K.P.KALIYAMURTHIE 1, D.PARAMESWARI 2 Professor and Head, Dept. of IT, Bharath University, Chennai-600 073 1 Asst. Prof. (SG), Dept. of Computer Applications,

More information

Load Balancing in Structured Peer to Peer Systems

Load Balancing in Structured Peer to Peer Systems Load Balancing in Structured Peer to Peer Systems Dr.K.P.Kaliyamurthie 1, D.Parameswari 2 1.Professor and Head, Dept. of IT, Bharath University, Chennai-600 073. 2.Asst. Prof.(SG), Dept. of Computer Applications,

More information

Business Intelligence and Decision Support Systems

Business Intelligence and Decision Support Systems Chapter 12 Business Intelligence and Decision Support Systems Information Technology For Management 7 th Edition Turban & Volonino Based on lecture slides by L. Beaubien, Providence College John Wiley

More information

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Web Mining Margherita Berardi LACAM Dipartimento di Informatica Università degli Studi di Bari berardi@di.uniba.it Bari, 24 Aprile 2003 Overview Introduction Knowledge discovery from text (Web Content

More information

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER INTRODUCTION TO SAS TEXT MINER TODAY S AGENDA INTRODUCTION TO SAS TEXT MINER Define data mining Overview of SAS Enterprise Miner Describe text analytics and define text data mining Text Mining Process

More information

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior

Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), jagatheshwaran.n@gmail.com, Velalar College of Engineering and Technology,

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

Building A Smart Academic Advising System Using Association Rule Mining

Building A Smart Academic Advising System Using Association Rule Mining Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

α α λ α = = λ λ α ψ = = α α α λ λ ψ α = + β = > θ θ β > β β θ θ θ β θ β γ θ β = γ θ > β > γ θ β γ = θ β = θ β = θ β = β θ = β β θ = = = β β θ = + α α α α α = = λ λ λ λ λ λ λ = λ λ α α α α λ ψ + α =

More information

Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Facilitating Business Process Discovery using Email Analysis Matin Mavaddat Matin.Mavaddat@live.uwe.ac.uk Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

More information

CLUSTERING FOR FORENSIC ANALYSIS

CLUSTERING FOR FORENSIC ANALYSIS IMPACT: International Journal of Research in Engineering & Technology (IMPACT: IJRET) ISSN(E): 2321-8843; ISSN(P): 2347-4599 Vol. 2, Issue 4, Apr 2014, 129-136 Impact Journals CLUSTERING FOR FORENSIC ANALYSIS

More information

Big Data Text Mining and Visualization. Anton Heijs

Big Data Text Mining and Visualization. Anton Heijs Copyright 2007 by Treparel Information Solutions BV. This report nor any part of it may be copied, circulated, quoted without prior written approval from Treparel7 Treparel Information Solutions BV Delftechpark

More information

A Stock Pattern Recognition Algorithm Based on Neural Networks

A Stock Pattern Recognition Algorithm Based on Neural Networks A Stock Pattern Recognition Algorithm Based on Neural Networks Xinyu Guo guoxinyu@icst.pku.edu.cn Xun Liang liangxun@icst.pku.edu.cn Xiang Li lixiang@icst.pku.edu.cn Abstract pattern respectively. Recent

More information

A Business Process Services Portal

A Business Process Services Portal A Business Process Services Portal IBM Research Report RZ 3782 Cédric Favre 1, Zohar Feldman 3, Beat Gfeller 1, Thomas Gschwind 1, Jana Koehler 1, Jochen M. Küster 1, Oleksandr Maistrenko 1, Alexandru

More information

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING

A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING A FUZZY BASED APPROACH TO TEXT MINING AND DOCUMENT CLUSTERING Sumit Goswami 1 and Mayank Singh Shishodia 2 1 Indian Institute of Technology-Kharagpur, Kharagpur, India sumit_13@yahoo.com 2 School of Computer

More information

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

More information

Determining optimal window size for texture feature extraction methods

Determining optimal window size for texture feature extraction methods IX Spanish Symposium on Pattern Recognition and Image Analysis, Castellon, Spain, May 2001, vol.2, 237-242, ISBN: 84-8021-351-5. Determining optimal window size for texture feature extraction methods Domènec

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Dynamic Data in terms of Data Mining Streams

Dynamic Data in terms of Data Mining Streams International Journal of Computer Science and Software Engineering Volume 2, Number 1 (2015), pp. 1-6 International Research Publication House http://www.irphouse.com Dynamic Data in terms of Data Mining

More information

Cloud Storage-based Intelligent Document Archiving for the Management of Big Data

Cloud Storage-based Intelligent Document Archiving for the Management of Big Data Cloud Storage-based Intelligent Document Archiving for the Management of Big Data Keedong Yoo Dept. of Management Information Systems Dankook University Cheonan, Republic of Korea Abstract : The cloud

More information

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING

dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING dm106 TEXT MINING FOR CUSTOMER RELATIONSHIP MANAGEMENT: AN APPROACH BASED ON LATENT SEMANTIC ANALYSIS AND FUZZY CLUSTERING ABSTRACT In most CRM (Customer Relationship Management) systems, information on

More information

KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery

KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery Mario Cannataro 1 and Domenico Talia 2 1 ICAR-CNR 2 DEIS Via P. Bucci, Cubo 41-C University of Calabria 87036 Rende (CS) Via P. Bucci,

More information

Data Pre-Processing in Spam Detection

Data Pre-Processing in Spam Detection IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 11 May 2015 ISSN (online): 2349-784X Data Pre-Processing in Spam Detection Anjali Sharma Dr. Manisha Manisha Dr. Rekha Jain

More information

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches

Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches Modelling, Extraction and Description of Intrinsic Cues of High Resolution Satellite Images: Independent Component Analysis based approaches PhD Thesis by Payam Birjandi Director: Prof. Mihai Datcu Problematic

More information

Cross-lingual Synonymy Overlap

Cross-lingual Synonymy Overlap Cross-lingual Synonymy Overlap Anca Dinu 1, Liviu P. Dinu 2, Ana Sabina Uban 2 1 Faculty of Foreign Languages and Literatures, University of Bucharest 2 Faculty of Mathematics and Computer Science, University

More information

The Enron Corpus: A New Dataset for Email Classification Research

The Enron Corpus: A New Dataset for Email Classification Research The Enron Corpus: A New Dataset for Email Classification Research Bryan Klimt and Yiming Yang Language Technologies Institute Carnegie Mellon University Pittsburgh, PA 15213-8213, USA {bklimt,yiming}@cs.cmu.edu

More information

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract)

Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract) Designing an Object Relational Data Warehousing System: Project ORDAWA * (Extended Abstract) Johann Eder 1, Heinz Frank 1, Tadeusz Morzy 2, Robert Wrembel 2, Maciej Zakrzewicz 2 1 Institut für Informatik

More information

Healthcare Measurement Analysis Using Data mining Techniques

Healthcare Measurement Analysis Using Data mining Techniques www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik

More information

Chapter ML:XI. XI. Cluster Analysis

Chapter ML:XI. XI. Cluster Analysis Chapter ML:XI XI. Cluster Analysis Data Mining Overview Cluster Analysis Basics Hierarchical Cluster Analysis Iterative Cluster Analysis Density-Based Cluster Analysis Cluster Evaluation Constrained Cluster

More information

An ontology-based approach for semantic ranking of the web search engines results

An ontology-based approach for semantic ranking of the web search engines results An ontology-based approach for semantic ranking of the web search engines results Editor(s): Name Surname, University, Country Solicited review(s): Name Surname, University, Country Open review(s): Name

More information

Big Data: Rethinking Text Visualization

Big Data: Rethinking Text Visualization Big Data: Rethinking Text Visualization Dr. Anton Heijs anton.heijs@treparel.com Treparel April 8, 2013 Abstract In this white paper we discuss text visualization approaches and how these are important

More information

Internet of Things, data management for healthcare applications. Ontology and automatic classifications

Internet of Things, data management for healthcare applications. Ontology and automatic classifications Internet of Things, data management for healthcare applications. Ontology and automatic classifications Inge.Krogstad@nor.sas.com SAS Institute Norway Different challenges same opportunities! Data capture

More information

Collecting Polish German Parallel Corpora in the Internet

Collecting Polish German Parallel Corpora in the Internet Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska

More information

PoS-tagging Italian texts with CORISTagger

PoS-tagging Italian texts with CORISTagger PoS-tagging Italian texts with CORISTagger Fabio Tamburini DSLO, University of Bologna, Italy fabio.tamburini@unibo.it Abstract. This paper presents an evolution of CORISTagger [1], an high-performance

More information

Data mining and complex telecommunications problems modeling

Data mining and complex telecommunications problems modeling Paper Data mining and complex telecommunications problems modeling Janusz Granat Abstract The telecommunications operators have to manage one of the most complex systems developed by human beings. Moreover,

More information

Twitter Analytics: Architecture, Tools and Analysis

Twitter Analytics: Architecture, Tools and Analysis Twitter Analytics: Architecture, Tools and Analysis Rohan D.W Perera CERDEC Ft Monmouth, NJ 07703-5113 S. Anand, K. P. Subbalakshmi and R. Chandramouli Department of ECE, Stevens Institute Of Technology

More information

Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin

Error Log Processing for Accurate Failure Prediction. Humboldt-Universität zu Berlin Error Log Processing for Accurate Failure Prediction Felix Salfner ICSI Berkeley Steffen Tschirpke Humboldt-Universität zu Berlin Introduction Context of work: Error-based online failure prediction: error

More information

An Information Retrieval using weighted Index Terms in Natural Language document collections

An Information Retrieval using weighted Index Terms in Natural Language document collections Internet and Information Technology in Modern Organizations: Challenges & Answers 635 An Information Retrieval using weighted Index Terms in Natural Language document collections Ahmed A. A. Radwan, Minia

More information

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications

Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Mining Signatures in Healthcare Data Based on Event Sequences and its Applications Siddhanth Gokarapu 1, J. Laxmi Narayana 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India 1

More information

Supply chain management by means of FLM-rules

Supply chain management by means of FLM-rules Supply chain management by means of FLM-rules Nicolas Le Normand, Julien Boissière, Nicolas Méger, Lionel Valet LISTIC Laboratory - Polytech Savoie Université de Savoie B.P. 80439 F-74944 Annecy-Le-Vieux,

More information

Deposit Identification Utility and Visualization Tool

Deposit Identification Utility and Visualization Tool Deposit Identification Utility and Visualization Tool Colorado School of Mines Field Session Summer 2014 David Alexander Jeremy Kerr Luke McPherson Introduction Newmont Mining Corporation was founded in

More information

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining

Lluis Belanche + Alfredo Vellido. Intelligent Data Analysis and Data Mining Lluis Belanche + Alfredo Vellido Intelligent Data Analysis and Data Mining a.k.a. Data Mining II Office 319, Omega, BCN EET, office 107, TR 2, Terrassa avellido@lsi.upc.edu skype, gtalk: avellido Tels.:

More information

Data Mining: A Preprocessing Engine

Data Mining: A Preprocessing Engine Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

More information

A Framework for Intelligent Online Customer Service System

A Framework for Intelligent Online Customer Service System A Framework for Intelligent Online Customer Service System Yiping WANG Yongjin ZHANG School of Business Administration, Xi an University of Technology Abstract: In a traditional customer service support

More information

Stock Market Prediction Using Data Mining

Stock Market Prediction Using Data Mining Stock Market Prediction Using Data Mining 1 Ruchi Desai, 2 Prof.Snehal Gandhi 1 M.E., 2 M.Tech. 1 Computer Department 1 Sarvajanik College of Engineering and Technology, Surat, Gujarat, India Abstract

More information

Open Domain Information Extraction. Günter Neumann, DFKI, 2012

Open Domain Information Extraction. Günter Neumann, DFKI, 2012 Open Domain Information Extraction Günter Neumann, DFKI, 2012 Improving TextRunner Wu and Weld (2010) Open Information Extraction using Wikipedia, ACL 2010 Fader et al. (2011) Identifying Relations for

More information

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

More information

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE

NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE www.arpapress.com/volumes/vol13issue3/ijrras_13_3_18.pdf NEW TECHNIQUE TO DEAL WITH DYNAMIC DATA MINING IN THE DATABASE Hebah H. O. Nasereddin Middle East University, P.O. Box: 144378, Code 11814, Amman-Jordan

More information

jeti: A Tool for Remote Tool Integration

jeti: A Tool for Remote Tool Integration jeti: A Tool for Remote Tool Integration Tiziana Margaria 1, Ralf Nagel 2, and Bernhard Steffen 2 1 Service Engineering for Distributed Systems, Institute for Informatics, University of Göttingen, Germany

More information

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type.

Three types of messages: A, B, C. Assume A is the oldest type, and C is the most recent type. Chronological Sampling for Email Filtering Ching-Lung Fu 2, Daniel Silver 1, and James Blustein 2 1 Acadia University, Wolfville, Nova Scotia, Canada 2 Dalhousie University, Halifax, Nova Scotia, Canada

More information

Customer Intentions Analysis of Twitter Based on Semantic Patterns

Customer Intentions Analysis of Twitter Based on Semantic Patterns Customer Intentions Analysis of Twitter Based on Semantic Patterns Mohamed Hamroun mohamed.hamrounn@gmail.com Mohamed Salah Gouider ms.gouider@yahoo.fr Lamjed Ben Said lamjed.bensaid@isg.rnu.tn ABSTRACT

More information

Course Design Document. IS414: Search Engine Technologies

Course Design Document. IS414: Search Engine Technologies Course Design Document IS414: Search Engine Technologies Version 2.7 6 June 2011 IS414 Search Engine Technologies Page 1 Table of Contents 1. Revision History... 3 2. Overview of the Search Engine Technologies

More information

A Survey of Text Mining Techniques and Applications

A Survey of Text Mining Techniques and Applications 60 JOURNAL OF EMERGING TECHNOLOGIES IN WEB INTELLIGENCE, VOL. 1, NO. 1, AUGUST 2009 A Survey of Text Mining Techniques and Applications Vishal Gupta Lecturer Computer Science & Engineering, University

More information

Mining Text Data: An Introduction

Mining Text Data: An Introduction Bölüm 10. Metin ve WEB Madenciliği http://ceng.gazi.edu.tr/~ozdemir Mining Text Data: An Introduction Data Mining / Knowledge Discovery Structured Data Multimedia Free Text Hypertext HomeLoan ( Frank Rizzo

More information

EXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION

EXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION EXPLOITING FOLKSONOMIES AND ONTOLOGIES IN AN E-BUSINESS APPLICATION Anna Goy and Diego Magro Dipartimento di Informatica, Università di Torino C. Svizzera, 185, I-10149 Italy ABSTRACT This paper proposes

More information

Word Taxonomy for On-line Visual Asset Management and Mining

Word Taxonomy for On-line Visual Asset Management and Mining Word Taxonomy for On-line Visual Asset Management and Mining Osmar R. Zaïane * Eli Hagen ** Jiawei Han ** * Department of Computing Science, University of Alberta, Canada, zaiane@cs.uaberta.ca ** School

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Importance or the Role of Data Warehousing and Data Mining in Business Applications

Importance or the Role of Data Warehousing and Data Mining in Business Applications Journal of The International Association of Advanced Technology and Science Importance or the Role of Data Warehousing and Data Mining in Business Applications ATUL ARORA ANKIT MALIK Abstract Information

More information

Natural Language to Relational Query by Using Parsing Compiler

Natural Language to Relational Query by Using Parsing Compiler Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 3, March 2015,

More information

Automating Legal Research through Data Mining

Automating Legal Research through Data Mining Automating Legal Research through Data Mining M.F.M Firdhous, Faculty of Information Technology, University of Moratuwa, Moratuwa, Sri Lanka. Mohamed.Firdhous@uom.lk Abstract The term legal research generally

More information

Standardization of Components, Products and Processes with Data Mining

Standardization of Components, Products and Processes with Data Mining B. Agard and A. Kusiak, Standardization of Components, Products and Processes with Data Mining, International Conference on Production Research Americas 2004, Santiago, Chile, August 1-4, 2004. Standardization

More information

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Understanding Web personalization with Web Usage Mining and its Application: Recommender System Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,

More information

Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs

Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Interactive Recovery of Requirements Traceability Links Using User Feedback and Configuration Management Logs Ryosuke Tsuchiya 1, Hironori Washizaki 1, Yoshiaki Fukazawa 1, Keishi Oshima 2, and Ryota Mibe

More information

Personalization of Web Search With Protected Privacy

Personalization of Web Search With Protected Privacy Personalization of Web Search With Protected Privacy S.S DIVYA, R.RUBINI,P.EZHIL Final year, Information Technology,KarpagaVinayaga College Engineering and Technology, Kanchipuram [D.t] Final year, Information

More information

Standardization and Its Effects on K-Means Clustering Algorithm

Standardization and Its Effects on K-Means Clustering Algorithm Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03

More information

PartJoin: An Efficient Storage and Query Execution for Data Warehouses

PartJoin: An Efficient Storage and Query Execution for Data Warehouses PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE ladjel@imerir.com 2

More information

Knowledge Management

Knowledge Management Knowledge Management Management Information Code: 164292-02 Course: Management Information Period: Autumn 2013 Professor: Sync Sangwon Lee, Ph. D D. of Information & Electronic Commerce 1 00. Contents

More information

Building a Question Classifier for a TREC-Style Question Answering System

Building a Question Classifier for a TREC-Style Question Answering System Building a Question Classifier for a TREC-Style Question Answering System Richard May & Ari Steinberg Topic: Question Classification We define Question Classification (QC) here to be the task that, given

More information

Scalable Parallel Clustering for Data Mining on Multicomputers

Scalable Parallel Clustering for Data Mining on Multicomputers Scalable Parallel Clustering for Data Mining on Multicomputers D. Foti, D. Lipari, C. Pizzuti and D. Talia ISI-CNR c/o DEIS, UNICAL 87036 Rende (CS), Italy {pizzuti,talia}@si.deis.unical.it Abstract. This

More information

Discretization and grouping: preprocessing steps for Data Mining

Discretization and grouping: preprocessing steps for Data Mining Discretization and grouping: preprocessing steps for Data Mining PetrBerka 1 andivanbruha 2 1 LaboratoryofIntelligentSystems Prague University of Economic W. Churchill Sq. 4, Prague CZ 13067, Czech Republic

More information

Role of Text Mining in Business Intelligence

Role of Text Mining in Business Intelligence Role of Text Mining in Business Intelligence Palak Gupta 1, Barkha Narang 2 Abstract This paper includes the combined study of business intelligence and text mining of uncertain data. The data that is

More information

Knowledge Discovery from patents using KMX Text Analytics

Knowledge Discovery from patents using KMX Text Analytics Knowledge Discovery from patents using KMX Text Analytics Dr. Anton Heijs anton.heijs@treparel.com Treparel Abstract In this white paper we discuss how the KMX technology of Treparel can help searchers

More information

Online Failure Prediction in Cloud Datacenters

Online Failure Prediction in Cloud Datacenters Online Failure Prediction in Cloud Datacenters Yukihiro Watanabe Yasuhide Matsumoto Once failures occur in a cloud datacenter accommodating a large number of virtual resources, they tend to spread rapidly

More information

Blog Post Extraction Using Title Finding

Blog Post Extraction Using Title Finding Blog Post Extraction Using Title Finding Linhai Song 1, 2, Xueqi Cheng 1, Yan Guo 1, Bo Wu 1, 2, Yu Wang 1, 2 1 Institute of Computing Technology, Chinese Academy of Sciences, Beijing 2 Graduate School

More information

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) (707.004) Roman Kern. KTI, TU Graz 2015-03-05 Ensemble Methods Knowledge Discovery and Data Mining 2 (VU) (707004) Roman Kern KTI, TU Graz 2015-03-05 Roman Kern (KTI, TU Graz) Ensemble Methods 2015-03-05 1 / 38 Outline 1 Introduction 2 Classification

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

Less naive Bayes spam detection

Less naive Bayes spam detection Less naive Bayes spam detection Hongming Yang Eindhoven University of Technology Dept. EE, Rm PT 3.27, P.O.Box 53, 5600MB Eindhoven The Netherlands. E-mail:h.m.yang@tue.nl also CoSiNe Connectivity Systems

More information

Implementing Heuristic Miner for Different Types of Event Logs

Implementing Heuristic Miner for Different Types of Event Logs Implementing Heuristic Miner for Different Types of Event Logs Angelina Prima Kurniati 1, GunturPrabawa Kusuma 2, GedeAgungAry Wisudiawan 3 1,3 School of Compuing, Telkom University, Indonesia. 2 School

More information

Semantic annotation of requirements for automatic UML class diagram generation

Semantic annotation of requirements for automatic UML class diagram generation www.ijcsi.org 259 Semantic annotation of requirements for automatic UML class diagram generation Soumaya Amdouni 1, Wahiba Ben Abdessalem Karaa 2 and Sondes Bouabid 3 1 University of tunis High Institute

More information

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April-2014 442 Over viewing issues of data mining with highlights of data warehousing Rushabh H. Baldaniya, Prof H.J.Baldaniya,

More information

Term extraction for user profiling: evaluation by the user

Term extraction for user profiling: evaluation by the user Term extraction for user profiling: evaluation by the user Suzan Verberne 1, Maya Sappelli 1,2, Wessel Kraaij 1,2 1 Institute for Computing and Information Sciences, Radboud University Nijmegen 2 TNO,

More information

Mining the Software Change Repository of a Legacy Telephony System

Mining the Software Change Repository of a Legacy Telephony System Mining the Software Change Repository of a Legacy Telephony System Jelber Sayyad Shirabad, Timothy C. Lethbridge, Stan Matwin School of Information Technology and Engineering University of Ottawa, Ottawa,

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information