Cluster analysis and Association analysis for the same data

Size: px
Start display at page:

Download "Cluster analysis and Association analysis for the same data"

Transcription

1 Cluster analysis and Association analysis for the same data Huaiguo Fu Telecommunications Software & Systems Group Waterford Institute of Technology Waterford, Ireland Abstract: Both cluster analysis and association analysis are important tasks of data mining. In some applications, we need both cluster analysis and association analysis for the same data. Each task takes very high time cost to deal with large data. In order to reduce expensive cost of the two mining tasks for large data set of transactions, we propose one strategy to unify cluster analysis and association analysis. This paper presents a new core algorithm of the strategy for analysis of large and high-dimensional data as well. The experimental results show the efficiency of this algorithm. Key Words: Association analysis, Clustering, Closed set, Concept lattice, Algorithm 1 Introduction Both cluster analysis and association analysis are important tasks of data mining. In recent years, cluster analysis and association analysis have attracted a lot of attention among the fields of research and applications. Cluster analysis and association analysis play an important role in data mining applications such as text mining, Web mining, information retrieval and biomedical informatics, and many others. A variety of techniques and approaches of cluster analysis and association analysis have been developed and successfully applied to real-life data mining problems. However, due to large amounts of data continue to grow inexorably in size and complexity, the techniques and approaches of cluster analysis and association analysis suffer from the challenges such as very large data, high-dimensional data, distributed heterogeneous data, and complex data, etc. In some applications, we need both cluster analysis and association analysis for the same data. Each task takes very high time cost to deal with large data. Although cluster analysis and association analysis are separated tasks for research and applications, in order to reduce the expensive cost of data mining tasks, we propose to unify the cluster analysis and association analysis for mining the database of transactions. This is the key motivation to unify cluster analysis and association analysis. Furthermore, we can unify cluster analysis and association analysis for database of transactions due to the following reasons: 1) Both of them analyze the relationship between the elements of data set. In fact, the two tasks extract the same essential relationship: similarity. Only the description and bounds of the relationship are different. Frequent pattern reveals one kind of similarity between elements of data. Cluster analysis may reveal associations and relationships in data that may contribute to mining the models or rules from data. So the elements in a frequent pattern are similar, and the similar elements may have the same frequency. 2) Mining closed sets can be an essential step for cluster analysis and association analysis on transactional data. Some existing works show we can extract the clusters and frequent patterns from closed sets [2, 15]. Cluster analysis and association analysis may share the closed sets for mining the same data set. So we need not to extract closed sets separately for cluster analysis and association analysis. 3) Closed sets mining provides a solution to interpret the clusters and frequent patterns. For the most of techniques and approaches of cluster analysis and association analysis, it s hard to interpret the mining results. For example, it s hard to interpret the clusters and frequent pattern produced with existing mining techniques. It s also hard to give the signification of the distance measure in most of clustering methods. Closed sets is derived from formal concept analysis (FCA). The formal concept can help us to interpret the closed sets. Closed sets mining facilitates pattern interpretation. In human thinking and life, the objects are clustered by concepts and attributes, and we can interpret attribute pat- ISSN: Page 576 ISBN:

2 terns and object patterns with concepts. So the concept-based methods can be used for the interpretation of the clusters and frequent patterns. In this paper, the idea of unifying cluster analysis and association analysis focuses on the database of transactions. The main framework of the idea is: Generating the data context with the description of items or transactions from the database of transactions Mining closed sets and the lattice of closed sets of database of transactions with FCA In each closed sets, adding extended information such as support, similarity, and interpretation, etc. We propose a new structure of each node of lattice. The node contains attribute set, object set, the number of objects, support and similarity description. Generating the clusters and closed frequent patterns with the interpretation The core of FCA is concept lattice. Theoretical foundation of concept lattice founds on the mathematical lattice theory [1, 8]. Lattice is a popular mathematical structure for modeling conceptual hierarchies. Concept lattice is a method for deriving conceptual structures out of data. It allows us to analyze and mine the complex data for such as classification [11, 13], association rules mining [6, 7], clustering [10, 9, 4], etc. Due to the high dimension, large volume of data, we need to develop more scalable and more efficient techniques and methods to analyze and represent the large and high-dimensional data sets. In this paper we present a new algorithm to analyze large and highdimensional data. The rest of this paper is organized as follows. Basic definitions for unifying cluster analysis and association analysis are presented in the next section. The framework of unifying cluster analysis and association analysis is introduced in section 3. In section 4, we present a new algorithm. Section 5 shows the experimental results. The paper ends with a short conclusion in section 6. 2 Definitions Definition 1 Data context is defined by a triple (O, A, R), where O and A are two sets, and R is a relation between O and A. The elements of O are called transactions or objects, while the elements of A are called items or attributes. For example, Figure 1 represents a data context (O, A, R). O = {1, 2, 3, 4, 5, 6, 7, 8} is the set of objects, and A = {a 1, a 2, a 3, a 4, a 5, a 6, a 7, a 8 } is the set of items. The crosses in the table describe the relation R of O and A. In the data context we use detailed description for the name of each item and object. As an example, we only use digital formalization to describe each item and object. a 1 a 2 a 3 a 4 a 5 a 6 a 7 a Figure 1: An example of data context A data context is usually represented by the binary data, but in practice, the values of attribute are not binary, we can transform many-valued data context to binary values data context by concept scaling [8]. Definition 2 Two closure operators are defined as O 1 O 1 for set O and A 1 A 1 for set A. O 1 := {a A ora for all o O 1 } A 1 := {o O ora for all a A 1 } These two operators are called the Galois connection for (O, A, R). These operators are used to determine a formal concept. Definition 3 A formal concept of (O, A, R) is a pair (O 1, A 1 ) with O 1 O, A 1 A, O 1 = A 1 and A 1 = O 1. O 1 is called extent, A 1 is called intent. For example, (68, a 1 a 3 a 4 a 6 ) is a formal concept of the data context of Figure 1. a 1 a 3 a 4 a 6 is intent of (68, a 1 a 3 a 4 a 6 ), and 68 is extent of (68, a 1 a 3 a 4 a 6 ). Definition 4 We say that there is a hierarchical order between two formal concepts (O 1, A 1 ) and (O 2, A 2 ), if O 1 O 2 (or A 2 A 1 ). All formal concepts with the hierarchical order of concepts form a complete lattice called concept lattice. Definition 5 An itemset C A is a closed itemset iff C = C. ISSN: Page 577 ISBN:

3 (a 1, ) e(8) (a 1 a 7, 1234) e(4) (a 1 a 3, 34678) e(5) (a 1 a 2, 12356) e(5) (a1 a 4, 5678) e(4) (a 1 a 7 a 8, 234) e(3) (a 1 a 3 a 7 a 8, 34) e(2) (a 1 a 3 a 4, 678) e(3) (a 1 a 2 a 3, 36) (a e(2) 1 a 2 a 7, 123) e(3) (a 1 a 4 a 6, 568) e(3) (a 1 a 2 a 4 a 6, 56) e(2) (a 1 a 2 a 7 a 8, 23) e(2) (a 1 a 3 a 4 a 6, 68) e(2) (a 1 a 2 a 3 a 7 a 8, 3) e(1) (a 1 a 3 a 4 a 5, 7) e(1) (a 1 a 2 a 3 a 4 a 6, 6) e(1) (a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8, ) e(0) Figure 2: An example of knowledge lattice Definition 6 If C 1 and C 2 are closed itemsets, C 1 C 2, then we say that there is a hierarchical order between C 1 and C 2. All closed itemsets with the hierarchical order of closed itemsets form of a complete lattice called closed itemset lattice. Definition 7 A formal concept is called extended concept if the formal concept is added by described information of the formal concept in data context. We note (O 1, A 1 ) e(described information) or (A 1, O 1 ) e(described information) as the extended concept of (O 1, A 1 ). A concept lattice is called knowledge lattice if all formal concepts of the concept lattice are updated with their extended concepts. Figure 2 presents an example of knowledge lattice. Each node contains intent, extent and number of extent. 3 Framework of unifying cluster analysis and association analysis In this section, we propose a framework of unifying cluster analysis and association analysis (see Figure 3). From the database of transactions, we can generate data context that should be described by the items and transactions. And then an efficient algorithm should be applied to generate formal concepts. When the formal concepts are produced, some extended information should be extracted with formal concepts, according to the need of the mining task, to form extended concepts. Extended concepts can contain intent, extent, support and similarity description. Knowledge lattice can be generated with extended concepts. Finally, closed frequent patterns and clusters can be produced from the same knowledge lattice or extended concepts. Database Data context Closed Frequent Pattern Cluster Formal concepts Knowledge lattice: Extended concepts Concepts Support Description... Figure 3: Framework of unifying cluster analysis and association analysis Data context is the base of the mining task. Data context need to have understandable description for each item and transaction. Sometimes we need to reduce, transpose or order the data context. For example, when data have high dimension, especially the the size of object set is smaller than the size of item set, we can transpose the data context to generate formal concepts for mining high-dimensional data. Analyzing the most of lattice algorithms, we find that one algorithm can focuss on items or transactions of data context. The performances of an algorithm can be different according to the number of items and transac- ISSN: Page 578 ISBN:

4 tions. In this framework, the generation of formal concepts and knowledge lattice is the essential step. The key of the applications is the performance of the algorithm of generation of the formal concepts or closed itemsets. So we focus on lattice algorithm and propose a new algorithm based on lattice structure to generate frequent patterns in next section. 4 New algorithm In this section, we analyze the search space of the closed itemsets of a data context, and then present a new algorithm to analyze and represent large data. We can decompose the search space into many partitions such as A m, A m 1, A m 2, A m 3 or combination of some of them. In each partition we can look for the closed itemsets independently. But the problem is: how to balance the number of closed itemsets of partitions whether each partition contains closed itemsets For example, for the data context of Figure 1, we can decompose the search space into following 4 partitions: 4.1 Analysis of the search space partition 1 A 8 partition 2 A 4 partition 3 A 2 Using one example: a data context with 4 attributes (a m, a m 1, a m 2, a m 3 ), we analyze the search space of closed itemsets (see Figure 4). A 7 A 6 A 5 A 3 partition 4 A 1 a m 1a m a m a m 1 a m 2 a m 3 a m 2a m a m 2a m 1a m a m 3a m 1a m a m 2a m 1 a m 3a m a m 3a m 1a m 3a m 2 a m 3a m 2a m 1a m a m 3a m 2a m a m 3a m 2a m 1 Figure 4: An example of the search space of closed itemsets Figure 4 illustrates each node maybe a closed itemset for any data context with 4 attributes. The search space of closed itemsets is very large if there are too many attributes. It s hard for concept lattice structure to face the complexity of very large data. So we propose a new method to decompose the search space, and then separately deal with in each partition. In order to discuss the decomposition of the search space, we give the following definition. Definition 8 Given an attribute a i A of the context (O, A, R), a set E, a i E. We define a i E = {{a i } X for all X E}. Figure 5: Decomposition of the search space of the data context Figure 1 The result is there are no closed itemsets in partition 4, partition 3, partition 2 but 17 closed itemsets in partition 1. So there are some problems for this strategy to decompose the search space. We need to improve it. One solution is to order the data context. Definition 9 A data context is called ordered data context if we order the items of data context by number of objects of each item from the smallest to the biggest one, and the items with the same objects are merged as one item. We note ordered data context (O, A, R) of the data context (O, A, R). The following example (see Figure 6) is Ordered data context of the data context of figure 1. From the ordered data context, using the same method as above to decompose the search space in 4 partitions, we can get closed itemsets in each partition. We can prove that there exists closed itemsets in each A i of an ordered data context. For example, there are respectively 6, 6, 4, 1 closed itemsets in 4 partitions of the ordered data context (see Figure 6). A k = {a k } {{a k } X i } X i A j = a k {a k+1, a k+2,, a m } k + 1 j m Definition 10 An item a i of a data context (O, A, R), all subsets of {a i, a i+1,..., a m 1, a m } that include a i, form a search sub-space (for closed itemset) that is called folding search sub-space (F3S) of a i, denoted F 3S i. ISSN: Page 579 ISBN:

5 a 5 a 8 a 6 a 7 a 4 a 3 a 2 a Figure 6: An example of ordered data context Summing up the analysis of the search space of closed itemsets, we can order the data context as ordered data context, the search space of closed itemsets is: F 3S m F 3S m 1 F 3S m 2 F 3S m 3 F 3S i F 3S 1, and then decompose the search space into some partitions. We can generate closed itemsets in each partition. 4.2 The new algorithm Definition 11 Given an itemset A 1 A, A 1 = {b 1, b 2,..., b i,..., b k }, b i A. A 1 is an infrequent itemset. The candidate of next closed itemset of A 1, noted CA 1, is A 1 a i = (A 1 (a 1, a 2,..., a i 1 ) {a i }), where a i < b k and a i / A 1, a i is the biggest one of A with A 1 < A 1 a i following the order: a 1 <... < a i <... < a m. We propose a new algorithm that can be used to generate closed itemsets or frequent closed itemsets. The principle of the algorithm is presented by following steps: Decompose the search space into some partitions Convert (O, A, R) to (O, A, R) where A = {a 1, a 2..., a i,..., a m } In order to balance the number of closed itemsets of partitions, some items of A are chosen to form an order set P 1) P = {a P T, a P T 1..., a P k,..., a P 1 }, P = T, a P k A 2) a P T < a P T 1 <... < a P k <... < a P 2 < a P 1 = a m 3) A parameter DP is used to choose a P k (0 < DP < 1), where DP = {a 1,,a P k } {a 1,,a P k 1 } Get the partitions: [a P k, a P k+1 ) and [a P T ) 1) Interval [a Pk, a Pk+1 ) is the search space from item a Pk to a Pk+1 2) [a Pk, a Pk+1 ) = [ a PT ) = F 3SPT P k i<p k+1 P T (F 3S i ) Generate next frequent closed itemset from an itemset A 1 for each partition If A 1 minsupport, we search the next closure of A 1 If A 1 < minsupport, we search C A 1. The closed itemsets between A 1 and CA 1 are ignored Conceptual clustering [5, 12] can seek clusters by concept structures. One approach of conceptual clustering is based on concept lattice [3]. When minsupport = 1, this algorithm can be used to generate all closed itemsets and then conceptual overlapping clusters based on the algorithm [3]. 5 Experimental results We test our algorithm to generate frequent closed itemsets and clusters on some data of UCI [14] (see table 1). DataSet Objects Items Closed itemsets 1)breast-cancerwisconsin )house-votes )audiology )lung-cancer )agaricus-lepiota )promoters )soybean-large )dermatogogy Table 1: The datasets for experiments The algorithm is implemented in JAVA, and tested on all above contexts in two cases to compare and analyze the performance of the algorithm: Case1: generating frequent itemsets and clusters separately from the context; Case2: generating frequent itemsets and clusters from closed itemsets based on the new strategy. The experimental results (see figure 7) show the total time cost of Case1 is much higher than Case2. So the integration of the cluster analysis and association ISSN: Page 580 ISBN:

6 analysis based on closed itemsets mining can reduce expensive cost of the two mining tasks for large data set of transactions. Figure 7: The time cost (milliseconds) for two cases on test datasets 6 Conclusion and further work In this paper, we propose one strategy to unify the cluster analysis and association analysis for transactional database to reduce the expensive cost of data mining tasks. From data context, knowledge lattice can be generated with extended concepts. Extended concepts can contain intent, extent, support and similarity description. So closed frequent patterns and clusters can be produced from the same knowledge lattice or extended concepts. Furthermore, we present a new algorithm for analysis of large and highdimensional data. For future work, we will develop the algorithm to analyze huge and distributed data, and improve the algorithm for mining non-transactional database. Acknowledgements: This work is supported by Science Foundation Ireland via the Autonomic Management of Communications Networks and Services programme (grant no. 04/IN3/I4040C) and the project of EU IST Network of Excellence OPAALS. References: [1] G. Birkhoff. Lattice Theory. American Mathematical Society, Providence, RI, 3rd edition, [2] C. Carpineto and G. Romano. Galois: An order theoretic approach to conceptual clustering. In Proc. of the Machine Learning conf., pages 33 40, [3] C. Carpineto and G. Romano. Galois: An order-theoretic approach to conceptual clustering. In Proceedings of ICML 93, pages 33 40, Amherst, Juillet [4] C. Carpineto and G. Romano. Concept Data Analysis: Theory and Applications. John Wiley and Sons, [5] D. H. Fisher. Knowledge acquisition via incremental conceptual clustering. Machine Learning, (2): , [6] H. Fu and E. Mephu Nguifo. Partitioning large data to scale up lattice-based algorithm. In Proceedings of ICTAI03, pages , Sacramento, CA, November IEEE Computer Press. [7] H. Fu and E. Mephu Nguifo. Mining frequent closed itemsets for large data. In Proceedings of The 2004 International Conference on Machine Learning and Applications (ICMLA04), Louisville, USA, December [8] B. Ganter and R. Wille. Formal Concept Analysis. Mathematical Foundations. Springer, [9] R. Godin, G. Mineau, R. Missaoui, and H. Mili. Méthodes de classification conceptuelle basées sur les treillis de Galois et applications. Revue d intelligence artificielle, 9(2): , [10] R. Godin, R. Missaoui, and A. April. Experimental comparision of Galois lattice browsing with conventional information retrieval methods. Internat. J. Man-Machine studies, (38): , [11] D. Kourie and G. Oosthuizen. Lattices in Machine Learning: Complexity Issues. Acta Informatica, 35(4): , [12] M. Lebowitz. Experiments with incremental concept formation: Unimem. Machine Learning, (2): , [13] E. Mephu Nguifo and P. Njiwoua. Treillis de concepts et classification supervisèe. Technique et Science Informatiques, 24, Hermeslavoisier. [14] C. Merz and P. Murphy. UCI Repository of Machine Learning databases, [15] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Efficient mining of association rules using closed itemsets lattices. Journal of Information Systems, 24(1):25 46, ISSN: Page 581 ISBN:

The Theory of Concept Analysis and Customer Relationship Mining

The Theory of Concept Analysis and Customer Relationship Mining The Application of Association Rule Mining in CRM Based on Formal Concept Analysis HongSheng Xu * and Lan Wang College of Information Technology, Luoyang Normal University, Luoyang, 471022, China xhs_ls@sina.com

More information

Categorical Data Visualization and Clustering Using Subjective Factors

Categorical Data Visualization and Clustering Using Subjective Factors Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,

More information

A Web-based Browsing Mechanism Based on Conceptual Structures

A Web-based Browsing Mechanism Based on Conceptual Structures A Web-based Browsing Mechanism Based on Conceptual Structures Mihye Kim and Paul Compton School of Computer Science and Engineering University of New South Wales, Sydney, NSW 2052, Australia {mihyek, compton}@cse.unsw.edu.au

More information

Decision Tree Learning on Very Large Data Sets

Decision Tree Learning on Very Large Data Sets Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL

Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Jasna S MTech Student TKM College of engineering Kollam Manu J Pillai Assistant Professor

More information

On algorithms for construction line diagrams of concept lattices and the set of all concepts

On algorithms for construction line diagrams of concept lattices and the set of all concepts On algorithms for construction line diagrams of concept lattices and the set of all concepts Sergey A. Yevtushenko, August 7, 2001 Scientific advisor Prof. Dr. Tatiana Taran Outline of a talk Formal Concept

More information

Efficient Data Mining Based on Formal Concept Analysis

Efficient Data Mining Based on Formal Concept Analysis Efficient Data Mining Based on Formal Concept Analysis Gerd Stumme Institut für Angewandte Informatik und Formale Beschreibungsverfahren AIFB, Universität Karlsruhe, D 76128 Karlsruhe, Germany www.aifb.uni-karlsruhe.de/wbs/gst;

More information

Impact of Boolean factorization as preprocessing methods for classification of Boolean data

Impact of Boolean factorization as preprocessing methods for classification of Boolean data Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,

More information

Decision Trees for Mining Data Streams Based on the Gaussian Approximation

Decision Trees for Mining Data Streams Based on the Gaussian Approximation International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 Decision Trees for Mining Data Streams Based on the Gaussian Approximation S.Babu

More information

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan

More information

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.

Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013. Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing

More information

D-optimal plans in observational studies

D-optimal plans in observational studies D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

More information

International Journal of Innovative Research in Computer and Communication Engineering

International Journal of Innovative Research in Computer and Communication Engineering FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,

More information

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data

An Analysis on Density Based Clustering of Multi Dimensional Spatial Data An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,

More information

Clustering Data Streams

Clustering Data Streams Clustering Data Streams Mohamed Elasmar Prashant Thiruvengadachari Javier Salinas Martin gtg091e@mail.gatech.edu tprashant@gmail.com javisal1@gatech.edu Introduction: Data mining is the science of extracting

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques

Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,

More information

Self Organizing Maps for Visualization of Categories

Self Organizing Maps for Visualization of Categories Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, julian.szymanski@eti.pg.gda.pl

More information

Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm

Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Purpose: key concepts in mining frequent itemsets understand the Apriori algorithm run Apriori in Weka GUI and in programatic way 1 Theoretical

More information

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset

An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang

More information

Protein Protein Interaction Networks

Protein Protein Interaction Networks Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics

More information

Dynamic index selection in data warehouses

Dynamic index selection in data warehouses Dynamic index selection in data warehouses Stéphane Azefack 1, Kamel Aouiche 2 and Jérôme Darmont 1 1 Université de Lyon (ERIC Lyon 2) 5 avenue Pierre Mendès-France 69676 Bron Cedex France jerome.darmont@univ-lyon2.fr

More information

Distributed Computing and Big Data: Hadoop and MapReduce

Distributed Computing and Big Data: Hadoop and MapReduce Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Big Data with Rough Set Using Map- Reduce

Big Data with Rough Set Using Map- Reduce Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,

More information

CLANN: Concept Lattice-based Artificial Neural Network for supervised classification

CLANN: Concept Lattice-based Artificial Neural Network for supervised classification CLANN: Concept Lattice-based Artificial Neural Network for supervised classification Norbert Tsopzé 1,2, Engelbert Mephu Nguifo 2, and Gilbert Tindo 1 1 Université de Yaoundé I, Faculté des Sciences, Département

More information

SPATIAL DATA CLASSIFICATION AND DATA MINING

SPATIAL DATA CLASSIFICATION AND DATA MINING , pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal

More information

Mining Large Datasets: Case of Mining Graph Data in the Cloud

Mining Large Datasets: Case of Mining Graph Data in the Cloud Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large

More information

Introducing diversity among the models of multi-label classification ensemble

Introducing diversity among the models of multi-label classification ensemble Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and

More information

A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment

A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment Edmond H. Wu,MichaelK.Ng, Andy M. Yip,andTonyF.Chan Department of Mathematics, The University of Hong Kong Pokfulam Road,

More information

Enhanced data mining analysis in higher educational system using rough set theory

Enhanced data mining analysis in higher educational system using rough set theory African Journal of Mathematics and Computer Science Research Vol. 2(9), pp. 184-188, October, 2009 Available online at http://www.academicjournals.org/ajmcsr ISSN 2006-9731 2009 Academic Journals Review

More information

270107 - MD - Data Mining

270107 - MD - Data Mining Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 015 70 - FIB - Barcelona School of Informatics 715 - EIO - Department of Statistics and Operations Research 73 - CS - Department of

More information

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery

A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

Visualization of large data sets using MDS combined with LVQ.

Visualization of large data sets using MDS combined with LVQ. Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk

More information

CS 6220: Data Mining Techniques Course Project Description

CS 6220: Data Mining Techniques Course Project Description CS 6220: Data Mining Techniques Course Project Description College of Computer and Information Science Northeastern University Spring 2013 General Goal In this project, you will have an opportunity to

More information

Strategic Online Advertising: Modeling Internet User Behavior with

Strategic Online Advertising: Modeling Internet User Behavior with 2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew

More information

High-dimensional labeled data analysis with Gabriel graphs

High-dimensional labeled data analysis with Gabriel graphs High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:

More information

Philosophies and Advances in Scaling Mining Algorithms to Large Databases

Philosophies and Advances in Scaling Mining Algorithms to Large Databases Philosophies and Advances in Scaling Mining Algorithms to Large Databases Paul Bradley Apollo Data Technologies paul@apollodatatech.com Raghu Ramakrishnan UW-Madison raghu@cs.wisc.edu Johannes Gehrke Cornell

More information

Email Spam Detection Using Customized SimHash Function

Email Spam Detection Using Customized SimHash Function International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email

More information

Firewall Compressor: An Algorithm for Minimizing Firewall Policies

Firewall Compressor: An Algorithm for Minimizing Firewall Policies Firewall Compressor: An Algorithm for Minimizing Firewall Policies Alex Liu, Eric Torng, Chad Meiners Department of Computer Science Michigan State University {alexliu,torng,meinersc}@cse.msu.edu Introduction

More information

Vers une Analyse Conceptuelle des Réseaux Sociaux

Vers une Analyse Conceptuelle des Réseaux Sociaux Vers une Analyse Conceptuelle des Réseaux Sociaux Erick Stattner Martine Collard Laboratory of Mathematics and Computer Science (LAMIA) University of the French West Indies and Guiana, France MARAMI 2012

More information

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM

FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection

More information

ISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS

ISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS Advances and Applications in Statistical Sciences Proceedings of The IV Meeting on Dynamics of Social and Economic Systems Volume 2, Issue 2, 2010, Pages 303-314 2010 Mili Publications ISSUES IN RULE BASED

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)

More information

Experiments in Web Page Classification for Semantic Web

Experiments in Web Page Classification for Semantic Web Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address

More information

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*

More information

Clustering through Decision Tree Construction in Geology

Clustering through Decision Tree Construction in Geology Nonlinear Analysis: Modelling and Control, 2001, v. 6, No. 2, 29-41 Clustering through Decision Tree Construction in Geology Received: 22.10.2001 Accepted: 31.10.2001 A. Juozapavičius, V. Rapševičius Faculty

More information

Enhancing Quality of Data using Data Mining Method

Enhancing Quality of Data using Data Mining Method JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad

More information

How To Identify A Churner

How To Identify A Churner 2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

Data Outsourcing based on Secure Association Rule Mining Processes

Data Outsourcing based on Secure Association Rule Mining Processes , pp. 41-48 http://dx.doi.org/10.14257/ijsia.2015.9.3.05 Data Outsourcing based on Secure Association Rule Mining Processes V. Sujatha 1, Debnath Bhattacharyya 2, P. Silpa Chaitanya 3 and Tai-hoon Kim

More information

BUSINESS INTELLIGENCE AS SUPPORT TO KNOWLEDGE MANAGEMENT

BUSINESS INTELLIGENCE AS SUPPORT TO KNOWLEDGE MANAGEMENT ISSN 1804-0519 (Print), ISSN 1804-0527 (Online) www.academicpublishingplatforms.com BUSINESS INTELLIGENCE AS SUPPORT TO KNOWLEDGE MANAGEMENT JELICA TRNINIĆ, JOVICA ĐURKOVIĆ, LAZAR RAKOVIĆ Faculty of Economics

More information

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA

ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,

More information

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:

More information

Mining for Web Engineering

Mining for Web Engineering Mining for Engineering A. Venkata Krishna Prasad 1, Prof. S.Ramakrishna 2 1 Associate Professor, Department of Computer Science, MIPGS, Hyderabad 2 Professor, Department of Computer Science, Sri Venkateswara

More information

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10 1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom

More information

Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies

Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com

More information

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets

Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph

More information

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS

DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar

More information

Discovering Functional Dependencies and Association Rules by Navigating in a Lattice of OLAP Views

Discovering Functional Dependencies and Association Rules by Navigating in a Lattice of OLAP Views Discovering Functional Dependencies and Association Rules by Navigating in a Lattice of OLAP Views Pierre Allard, Sébastien Ferré, and Olivier Ridoux IRISA, Université de Rennes 1, Campus de Beaulieu 35042

More information

Query-Based Multicontexts for Knowledge Base Browsing: an Evaluation

Query-Based Multicontexts for Knowledge Base Browsing: an Evaluation Query-Based Multicontexts for Knowledge Base Browsing: an Evaluation Julien Tane, Phillip Cimiano, and Pascal Hitzler AIFB, Universität Karlsruhe (TH) 76128 Karlsruhe, Germany {jta,pci,hitzler}@aifb.uni-karlsruhe.de

More information

Data Mining - Evaluation of Classifiers

Data Mining - Evaluation of Classifiers Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

Support Vector Machines with Clustering for Training with Very Large Datasets

Support Vector Machines with Clustering for Training with Very Large Datasets Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

Domain Classification of Technical Terms Using the Web

Domain Classification of Technical Terms Using the Web Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using

More information

A Survey Study on Monitoring Service for Grid

A Survey Study on Monitoring Service for Grid A Survey Study on Monitoring Service for Grid Erkang You erkyou@indiana.edu ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide

More information

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS

RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

Data Mining: A Preprocessing Engine

Data Mining: A Preprocessing Engine Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,

More information

MODIFIED RECONSTRUCTABILITY ANALYSIS FOR MANY-VALUED FUNCTIONS AND RELATIONS

MODIFIED RECONSTRUCTABILITY ANALYSIS FOR MANY-VALUED FUNCTIONS AND RELATIONS MODIIED RECONSTRUCTABILITY ANALYSIS OR MANY-VALUED UNCTIONS AND RELATIONS Anas N. Al-Rabadi (1), and Martin Zwick (2) (1) ECE Department (2) Systems Science Ph.D. Program @Portland State University [alrabadi@ece.pdx.edu]

More information

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation biswajit.biswal@oracle.com ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development

More information

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms

Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,

More information

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis

Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of

More information

Operations Research and Knowledge Modeling in Data Mining

Operations Research and Knowledge Modeling in Data Mining Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp

More information

General Purpose Database Summarization

General Purpose Database Summarization Table of Content General Purpose Database Summarization A web service architecture for on-line database summarization Régis Saint-Paul (speaker), Guillaume Raschia, Noureddine Mouaddib LINA - Polytech

More information

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph

MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of

More information

COMPSCI 760 S2 C 2014 Machine Learning and Data Mining Computer Science Department

COMPSCI 760 S2 C 2014 Machine Learning and Data Mining Computer Science Department COMPSCI 760 S2 C 2014 Machine Learning and Data Mining Computer Science Department Research Projects 2014 This year students will work in groups of 3 on projects. Each project will be a small research

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent

More information

Clustering UE 141 Spring 2013

Clustering UE 141 Spring 2013 Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or

More information

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam

More information

About Universality and Flexibility of FCA-based Software Tools

About Universality and Flexibility of FCA-based Software Tools About Universality and Flexibility of FCA-based Software Tools A.A. Neznanov, A.A. Parinov National Research University Higher School of Economics, 20 Myasnitskaya Ulitsa, Moscow, 101000, Russia ANeznanov@hse.ru,

More information

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2

Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,

More information

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental

More information

How To Cluster

How To Cluster Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main

More information

Oracle8i Spatial: Experiences with Extensible Databases

Oracle8i Spatial: Experiences with Extensible Databases Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction

More information

Apriori-Map/Reduce Algorithm

Apriori-Map/Reduce Algorithm Apriori-Map/Reduce Algorithm Jongwook Woo Computer Information Systems Department California State University Los Angeles, CA Abstract Map/Reduce algorithm has received highlights as cloud computing services

More information

Load Balancing in Structured Peer to Peer Systems

Load Balancing in Structured Peer to Peer Systems Load Balancing in Structured Peer to Peer Systems Dr.K.P.Kaliyamurthie 1, D.Parameswari 2 1.Professor and Head, Dept. of IT, Bharath University, Chennai-600 073. 2.Asst. Prof.(SG), Dept. of Computer Applications,

More information

Data Warehousing and Data Mining in Business Applications

Data Warehousing and Data Mining in Business Applications 133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business

More information

Using Data Mining for Mobile Communication Clustering and Characterization

Using Data Mining for Mobile Communication Clustering and Characterization Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer

More information

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with

More information

Available online at www.sciencedirect.com Available online at www.sciencedirect.com. Advanced in Control Engineering and Information Science

Available online at www.sciencedirect.com Available online at www.sciencedirect.com. Advanced in Control Engineering and Information Science Available online at www.sciencedirect.com Available online at www.sciencedirect.com Procedia Procedia Engineering Engineering 00 (2011) 15 (2011) 000 000 1822 1826 Procedia Engineering www.elsevier.com/locate/procedia

More information

Dynamical Clustering of Personalized Web Search Results

Dynamical Clustering of Personalized Web Search Results Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked

More information

Unsupervised Data Mining (Clustering)

Unsupervised Data Mining (Clustering) Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in

More information

OLAP Visualization Operator for Complex Data

OLAP Visualization Operator for Complex Data OLAP Visualization Operator for Complex Data Sabine Loudcher and Omar Boussaid ERIC laboratory, University of Lyon (University Lyon 2) 5 avenue Pierre Mendes-France, 69676 Bron Cedex, France Tel.: +33-4-78772320,

More information