Cluster analysis and Association analysis for the same data
|
|
- Abner Jones
- 8 years ago
- Views:
Transcription
1 Cluster analysis and Association analysis for the same data Huaiguo Fu Telecommunications Software & Systems Group Waterford Institute of Technology Waterford, Ireland Abstract: Both cluster analysis and association analysis are important tasks of data mining. In some applications, we need both cluster analysis and association analysis for the same data. Each task takes very high time cost to deal with large data. In order to reduce expensive cost of the two mining tasks for large data set of transactions, we propose one strategy to unify cluster analysis and association analysis. This paper presents a new core algorithm of the strategy for analysis of large and high-dimensional data as well. The experimental results show the efficiency of this algorithm. Key Words: Association analysis, Clustering, Closed set, Concept lattice, Algorithm 1 Introduction Both cluster analysis and association analysis are important tasks of data mining. In recent years, cluster analysis and association analysis have attracted a lot of attention among the fields of research and applications. Cluster analysis and association analysis play an important role in data mining applications such as text mining, Web mining, information retrieval and biomedical informatics, and many others. A variety of techniques and approaches of cluster analysis and association analysis have been developed and successfully applied to real-life data mining problems. However, due to large amounts of data continue to grow inexorably in size and complexity, the techniques and approaches of cluster analysis and association analysis suffer from the challenges such as very large data, high-dimensional data, distributed heterogeneous data, and complex data, etc. In some applications, we need both cluster analysis and association analysis for the same data. Each task takes very high time cost to deal with large data. Although cluster analysis and association analysis are separated tasks for research and applications, in order to reduce the expensive cost of data mining tasks, we propose to unify the cluster analysis and association analysis for mining the database of transactions. This is the key motivation to unify cluster analysis and association analysis. Furthermore, we can unify cluster analysis and association analysis for database of transactions due to the following reasons: 1) Both of them analyze the relationship between the elements of data set. In fact, the two tasks extract the same essential relationship: similarity. Only the description and bounds of the relationship are different. Frequent pattern reveals one kind of similarity between elements of data. Cluster analysis may reveal associations and relationships in data that may contribute to mining the models or rules from data. So the elements in a frequent pattern are similar, and the similar elements may have the same frequency. 2) Mining closed sets can be an essential step for cluster analysis and association analysis on transactional data. Some existing works show we can extract the clusters and frequent patterns from closed sets [2, 15]. Cluster analysis and association analysis may share the closed sets for mining the same data set. So we need not to extract closed sets separately for cluster analysis and association analysis. 3) Closed sets mining provides a solution to interpret the clusters and frequent patterns. For the most of techniques and approaches of cluster analysis and association analysis, it s hard to interpret the mining results. For example, it s hard to interpret the clusters and frequent pattern produced with existing mining techniques. It s also hard to give the signification of the distance measure in most of clustering methods. Closed sets is derived from formal concept analysis (FCA). The formal concept can help us to interpret the closed sets. Closed sets mining facilitates pattern interpretation. In human thinking and life, the objects are clustered by concepts and attributes, and we can interpret attribute pat- ISSN: Page 576 ISBN:
2 terns and object patterns with concepts. So the concept-based methods can be used for the interpretation of the clusters and frequent patterns. In this paper, the idea of unifying cluster analysis and association analysis focuses on the database of transactions. The main framework of the idea is: Generating the data context with the description of items or transactions from the database of transactions Mining closed sets and the lattice of closed sets of database of transactions with FCA In each closed sets, adding extended information such as support, similarity, and interpretation, etc. We propose a new structure of each node of lattice. The node contains attribute set, object set, the number of objects, support and similarity description. Generating the clusters and closed frequent patterns with the interpretation The core of FCA is concept lattice. Theoretical foundation of concept lattice founds on the mathematical lattice theory [1, 8]. Lattice is a popular mathematical structure for modeling conceptual hierarchies. Concept lattice is a method for deriving conceptual structures out of data. It allows us to analyze and mine the complex data for such as classification [11, 13], association rules mining [6, 7], clustering [10, 9, 4], etc. Due to the high dimension, large volume of data, we need to develop more scalable and more efficient techniques and methods to analyze and represent the large and high-dimensional data sets. In this paper we present a new algorithm to analyze large and highdimensional data. The rest of this paper is organized as follows. Basic definitions for unifying cluster analysis and association analysis are presented in the next section. The framework of unifying cluster analysis and association analysis is introduced in section 3. In section 4, we present a new algorithm. Section 5 shows the experimental results. The paper ends with a short conclusion in section 6. 2 Definitions Definition 1 Data context is defined by a triple (O, A, R), where O and A are two sets, and R is a relation between O and A. The elements of O are called transactions or objects, while the elements of A are called items or attributes. For example, Figure 1 represents a data context (O, A, R). O = {1, 2, 3, 4, 5, 6, 7, 8} is the set of objects, and A = {a 1, a 2, a 3, a 4, a 5, a 6, a 7, a 8 } is the set of items. The crosses in the table describe the relation R of O and A. In the data context we use detailed description for the name of each item and object. As an example, we only use digital formalization to describe each item and object. a 1 a 2 a 3 a 4 a 5 a 6 a 7 a Figure 1: An example of data context A data context is usually represented by the binary data, but in practice, the values of attribute are not binary, we can transform many-valued data context to binary values data context by concept scaling [8]. Definition 2 Two closure operators are defined as O 1 O 1 for set O and A 1 A 1 for set A. O 1 := {a A ora for all o O 1 } A 1 := {o O ora for all a A 1 } These two operators are called the Galois connection for (O, A, R). These operators are used to determine a formal concept. Definition 3 A formal concept of (O, A, R) is a pair (O 1, A 1 ) with O 1 O, A 1 A, O 1 = A 1 and A 1 = O 1. O 1 is called extent, A 1 is called intent. For example, (68, a 1 a 3 a 4 a 6 ) is a formal concept of the data context of Figure 1. a 1 a 3 a 4 a 6 is intent of (68, a 1 a 3 a 4 a 6 ), and 68 is extent of (68, a 1 a 3 a 4 a 6 ). Definition 4 We say that there is a hierarchical order between two formal concepts (O 1, A 1 ) and (O 2, A 2 ), if O 1 O 2 (or A 2 A 1 ). All formal concepts with the hierarchical order of concepts form a complete lattice called concept lattice. Definition 5 An itemset C A is a closed itemset iff C = C. ISSN: Page 577 ISBN:
3 (a 1, ) e(8) (a 1 a 7, 1234) e(4) (a 1 a 3, 34678) e(5) (a 1 a 2, 12356) e(5) (a1 a 4, 5678) e(4) (a 1 a 7 a 8, 234) e(3) (a 1 a 3 a 7 a 8, 34) e(2) (a 1 a 3 a 4, 678) e(3) (a 1 a 2 a 3, 36) (a e(2) 1 a 2 a 7, 123) e(3) (a 1 a 4 a 6, 568) e(3) (a 1 a 2 a 4 a 6, 56) e(2) (a 1 a 2 a 7 a 8, 23) e(2) (a 1 a 3 a 4 a 6, 68) e(2) (a 1 a 2 a 3 a 7 a 8, 3) e(1) (a 1 a 3 a 4 a 5, 7) e(1) (a 1 a 2 a 3 a 4 a 6, 6) e(1) (a 1 a 2 a 3 a 4 a 5 a 6 a 7 a 8, ) e(0) Figure 2: An example of knowledge lattice Definition 6 If C 1 and C 2 are closed itemsets, C 1 C 2, then we say that there is a hierarchical order between C 1 and C 2. All closed itemsets with the hierarchical order of closed itemsets form of a complete lattice called closed itemset lattice. Definition 7 A formal concept is called extended concept if the formal concept is added by described information of the formal concept in data context. We note (O 1, A 1 ) e(described information) or (A 1, O 1 ) e(described information) as the extended concept of (O 1, A 1 ). A concept lattice is called knowledge lattice if all formal concepts of the concept lattice are updated with their extended concepts. Figure 2 presents an example of knowledge lattice. Each node contains intent, extent and number of extent. 3 Framework of unifying cluster analysis and association analysis In this section, we propose a framework of unifying cluster analysis and association analysis (see Figure 3). From the database of transactions, we can generate data context that should be described by the items and transactions. And then an efficient algorithm should be applied to generate formal concepts. When the formal concepts are produced, some extended information should be extracted with formal concepts, according to the need of the mining task, to form extended concepts. Extended concepts can contain intent, extent, support and similarity description. Knowledge lattice can be generated with extended concepts. Finally, closed frequent patterns and clusters can be produced from the same knowledge lattice or extended concepts. Database Data context Closed Frequent Pattern Cluster Formal concepts Knowledge lattice: Extended concepts Concepts Support Description... Figure 3: Framework of unifying cluster analysis and association analysis Data context is the base of the mining task. Data context need to have understandable description for each item and transaction. Sometimes we need to reduce, transpose or order the data context. For example, when data have high dimension, especially the the size of object set is smaller than the size of item set, we can transpose the data context to generate formal concepts for mining high-dimensional data. Analyzing the most of lattice algorithms, we find that one algorithm can focuss on items or transactions of data context. The performances of an algorithm can be different according to the number of items and transac- ISSN: Page 578 ISBN:
4 tions. In this framework, the generation of formal concepts and knowledge lattice is the essential step. The key of the applications is the performance of the algorithm of generation of the formal concepts or closed itemsets. So we focus on lattice algorithm and propose a new algorithm based on lattice structure to generate frequent patterns in next section. 4 New algorithm In this section, we analyze the search space of the closed itemsets of a data context, and then present a new algorithm to analyze and represent large data. We can decompose the search space into many partitions such as A m, A m 1, A m 2, A m 3 or combination of some of them. In each partition we can look for the closed itemsets independently. But the problem is: how to balance the number of closed itemsets of partitions whether each partition contains closed itemsets For example, for the data context of Figure 1, we can decompose the search space into following 4 partitions: 4.1 Analysis of the search space partition 1 A 8 partition 2 A 4 partition 3 A 2 Using one example: a data context with 4 attributes (a m, a m 1, a m 2, a m 3 ), we analyze the search space of closed itemsets (see Figure 4). A 7 A 6 A 5 A 3 partition 4 A 1 a m 1a m a m a m 1 a m 2 a m 3 a m 2a m a m 2a m 1a m a m 3a m 1a m a m 2a m 1 a m 3a m a m 3a m 1a m 3a m 2 a m 3a m 2a m 1a m a m 3a m 2a m a m 3a m 2a m 1 Figure 4: An example of the search space of closed itemsets Figure 4 illustrates each node maybe a closed itemset for any data context with 4 attributes. The search space of closed itemsets is very large if there are too many attributes. It s hard for concept lattice structure to face the complexity of very large data. So we propose a new method to decompose the search space, and then separately deal with in each partition. In order to discuss the decomposition of the search space, we give the following definition. Definition 8 Given an attribute a i A of the context (O, A, R), a set E, a i E. We define a i E = {{a i } X for all X E}. Figure 5: Decomposition of the search space of the data context Figure 1 The result is there are no closed itemsets in partition 4, partition 3, partition 2 but 17 closed itemsets in partition 1. So there are some problems for this strategy to decompose the search space. We need to improve it. One solution is to order the data context. Definition 9 A data context is called ordered data context if we order the items of data context by number of objects of each item from the smallest to the biggest one, and the items with the same objects are merged as one item. We note ordered data context (O, A, R) of the data context (O, A, R). The following example (see Figure 6) is Ordered data context of the data context of figure 1. From the ordered data context, using the same method as above to decompose the search space in 4 partitions, we can get closed itemsets in each partition. We can prove that there exists closed itemsets in each A i of an ordered data context. For example, there are respectively 6, 6, 4, 1 closed itemsets in 4 partitions of the ordered data context (see Figure 6). A k = {a k } {{a k } X i } X i A j = a k {a k+1, a k+2,, a m } k + 1 j m Definition 10 An item a i of a data context (O, A, R), all subsets of {a i, a i+1,..., a m 1, a m } that include a i, form a search sub-space (for closed itemset) that is called folding search sub-space (F3S) of a i, denoted F 3S i. ISSN: Page 579 ISBN:
5 a 5 a 8 a 6 a 7 a 4 a 3 a 2 a Figure 6: An example of ordered data context Summing up the analysis of the search space of closed itemsets, we can order the data context as ordered data context, the search space of closed itemsets is: F 3S m F 3S m 1 F 3S m 2 F 3S m 3 F 3S i F 3S 1, and then decompose the search space into some partitions. We can generate closed itemsets in each partition. 4.2 The new algorithm Definition 11 Given an itemset A 1 A, A 1 = {b 1, b 2,..., b i,..., b k }, b i A. A 1 is an infrequent itemset. The candidate of next closed itemset of A 1, noted CA 1, is A 1 a i = (A 1 (a 1, a 2,..., a i 1 ) {a i }), where a i < b k and a i / A 1, a i is the biggest one of A with A 1 < A 1 a i following the order: a 1 <... < a i <... < a m. We propose a new algorithm that can be used to generate closed itemsets or frequent closed itemsets. The principle of the algorithm is presented by following steps: Decompose the search space into some partitions Convert (O, A, R) to (O, A, R) where A = {a 1, a 2..., a i,..., a m } In order to balance the number of closed itemsets of partitions, some items of A are chosen to form an order set P 1) P = {a P T, a P T 1..., a P k,..., a P 1 }, P = T, a P k A 2) a P T < a P T 1 <... < a P k <... < a P 2 < a P 1 = a m 3) A parameter DP is used to choose a P k (0 < DP < 1), where DP = {a 1,,a P k } {a 1,,a P k 1 } Get the partitions: [a P k, a P k+1 ) and [a P T ) 1) Interval [a Pk, a Pk+1 ) is the search space from item a Pk to a Pk+1 2) [a Pk, a Pk+1 ) = [ a PT ) = F 3SPT P k i<p k+1 P T (F 3S i ) Generate next frequent closed itemset from an itemset A 1 for each partition If A 1 minsupport, we search the next closure of A 1 If A 1 < minsupport, we search C A 1. The closed itemsets between A 1 and CA 1 are ignored Conceptual clustering [5, 12] can seek clusters by concept structures. One approach of conceptual clustering is based on concept lattice [3]. When minsupport = 1, this algorithm can be used to generate all closed itemsets and then conceptual overlapping clusters based on the algorithm [3]. 5 Experimental results We test our algorithm to generate frequent closed itemsets and clusters on some data of UCI [14] (see table 1). DataSet Objects Items Closed itemsets 1)breast-cancerwisconsin )house-votes )audiology )lung-cancer )agaricus-lepiota )promoters )soybean-large )dermatogogy Table 1: The datasets for experiments The algorithm is implemented in JAVA, and tested on all above contexts in two cases to compare and analyze the performance of the algorithm: Case1: generating frequent itemsets and clusters separately from the context; Case2: generating frequent itemsets and clusters from closed itemsets based on the new strategy. The experimental results (see figure 7) show the total time cost of Case1 is much higher than Case2. So the integration of the cluster analysis and association ISSN: Page 580 ISBN:
6 analysis based on closed itemsets mining can reduce expensive cost of the two mining tasks for large data set of transactions. Figure 7: The time cost (milliseconds) for two cases on test datasets 6 Conclusion and further work In this paper, we propose one strategy to unify the cluster analysis and association analysis for transactional database to reduce the expensive cost of data mining tasks. From data context, knowledge lattice can be generated with extended concepts. Extended concepts can contain intent, extent, support and similarity description. So closed frequent patterns and clusters can be produced from the same knowledge lattice or extended concepts. Furthermore, we present a new algorithm for analysis of large and highdimensional data. For future work, we will develop the algorithm to analyze huge and distributed data, and improve the algorithm for mining non-transactional database. Acknowledgements: This work is supported by Science Foundation Ireland via the Autonomic Management of Communications Networks and Services programme (grant no. 04/IN3/I4040C) and the project of EU IST Network of Excellence OPAALS. References: [1] G. Birkhoff. Lattice Theory. American Mathematical Society, Providence, RI, 3rd edition, [2] C. Carpineto and G. Romano. Galois: An order theoretic approach to conceptual clustering. In Proc. of the Machine Learning conf., pages 33 40, [3] C. Carpineto and G. Romano. Galois: An order-theoretic approach to conceptual clustering. In Proceedings of ICML 93, pages 33 40, Amherst, Juillet [4] C. Carpineto and G. Romano. Concept Data Analysis: Theory and Applications. John Wiley and Sons, [5] D. H. Fisher. Knowledge acquisition via incremental conceptual clustering. Machine Learning, (2): , [6] H. Fu and E. Mephu Nguifo. Partitioning large data to scale up lattice-based algorithm. In Proceedings of ICTAI03, pages , Sacramento, CA, November IEEE Computer Press. [7] H. Fu and E. Mephu Nguifo. Mining frequent closed itemsets for large data. In Proceedings of The 2004 International Conference on Machine Learning and Applications (ICMLA04), Louisville, USA, December [8] B. Ganter and R. Wille. Formal Concept Analysis. Mathematical Foundations. Springer, [9] R. Godin, G. Mineau, R. Missaoui, and H. Mili. Méthodes de classification conceptuelle basées sur les treillis de Galois et applications. Revue d intelligence artificielle, 9(2): , [10] R. Godin, R. Missaoui, and A. April. Experimental comparision of Galois lattice browsing with conventional information retrieval methods. Internat. J. Man-Machine studies, (38): , [11] D. Kourie and G. Oosthuizen. Lattices in Machine Learning: Complexity Issues. Acta Informatica, 35(4): , [12] M. Lebowitz. Experiments with incremental concept formation: Unimem. Machine Learning, (2): , [13] E. Mephu Nguifo and P. Njiwoua. Treillis de concepts et classification supervisèe. Technique et Science Informatiques, 24, Hermeslavoisier. [14] C. Merz and P. Murphy. UCI Repository of Machine Learning databases, [15] N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Efficient mining of association rules using closed itemsets lattices. Journal of Information Systems, 24(1):25 46, ISSN: Page 581 ISBN:
The Theory of Concept Analysis and Customer Relationship Mining
The Application of Association Rule Mining in CRM Based on Formal Concept Analysis HongSheng Xu * and Lan Wang College of Information Technology, Luoyang Normal University, Luoyang, 471022, China xhs_ls@sina.com
More informationCategorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
More informationA Web-based Browsing Mechanism Based on Conceptual Structures
A Web-based Browsing Mechanism Based on Conceptual Structures Mihye Kim and Paul Compton School of Computer Science and Engineering University of New South Wales, Sydney, NSW 2052, Australia {mihyek, compton}@cse.unsw.edu.au
More informationDecision Tree Learning on Very Large Data Sets
Decision Tree Learning on Very Large Data Sets Lawrence O. Hall Nitesh Chawla and Kevin W. Bowyer Department of Computer Science and Engineering ENB 8 University of South Florida 4202 E. Fowler Ave. Tampa
More informationInternational Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online
More informationPreparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL
Preparing Data Sets for the Data Mining Analysis using the Most Efficient Horizontal Aggregation Method in SQL Jasna S MTech Student TKM College of engineering Kollam Manu J Pillai Assistant Professor
More informationOn algorithms for construction line diagrams of concept lattices and the set of all concepts
On algorithms for construction line diagrams of concept lattices and the set of all concepts Sergey A. Yevtushenko, August 7, 2001 Scientific advisor Prof. Dr. Tatiana Taran Outline of a talk Formal Concept
More informationEfficient Data Mining Based on Formal Concept Analysis
Efficient Data Mining Based on Formal Concept Analysis Gerd Stumme Institut für Angewandte Informatik und Formale Beschreibungsverfahren AIFB, Universität Karlsruhe, D 76128 Karlsruhe, Germany www.aifb.uni-karlsruhe.de/wbs/gst;
More informationImpact of Boolean factorization as preprocessing methods for classification of Boolean data
Impact of Boolean factorization as preprocessing methods for classification of Boolean data Radim Belohlavek, Jan Outrata, Martin Trnecka Data Analysis and Modeling Lab (DAMOL) Dept. Computer Science,
More informationDecision Trees for Mining Data Streams Based on the Gaussian Approximation
International Journal of Computer Sciences and Engineering Open Access Review Paper Volume-4, Issue-3 E-ISSN: 2347-2693 Decision Trees for Mining Data Streams Based on the Gaussian Approximation S.Babu
More informationMINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan
More informationApplied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.
Applied Mathematical Sciences, Vol. 7, 2013, no. 112, 5591-5597 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2013.38457 Accuracy Rate of Predictive Models in Credit Screening Anirut Suebsing
More informationD-optimal plans in observational studies
D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational
More informationInternational Journal of Innovative Research in Computer and Communication Engineering
FP Tree Algorithm and Approaches in Big Data T.Rathika 1, J.Senthil Murugan 2 Assistant Professor, Department of CSE, SRM University, Ramapuram Campus, Chennai, Tamil Nadu,India 1 Assistant Professor,
More informationAn Analysis on Density Based Clustering of Multi Dimensional Spatial Data
An Analysis on Density Based Clustering of Multi Dimensional Spatial Data K. Mumtaz 1 Assistant Professor, Department of MCA Vivekanandha Institute of Information and Management Studies, Tiruchengode,
More informationClustering Data Streams
Clustering Data Streams Mohamed Elasmar Prashant Thiruvengadachari Javier Salinas Martin gtg091e@mail.gatech.edu tprashant@gmail.com javisal1@gatech.edu Introduction: Data mining is the science of extracting
More informationWeb Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
More informationComparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
More informationSelf Organizing Maps for Visualization of Categories
Self Organizing Maps for Visualization of Categories Julian Szymański 1 and Włodzisław Duch 2,3 1 Department of Computer Systems Architecture, Gdańsk University of Technology, Poland, julian.szymanski@eti.pg.gda.pl
More informationLaboratory Module 8 Mining Frequent Itemsets Apriori Algorithm
Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Purpose: key concepts in mining frequent itemsets understand the Apriori algorithm run Apriori in Weka GUI and in programatic way 1 Theoretical
More informationAn Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset
P P P Health An Analysis of Missing Data Treatment Methods and Their Application to Health Care Dataset Peng Liu 1, Elia El-Darzi 2, Lei Lei 1, Christos Vasilakis 2, Panagiotis Chountas 2, and Wei Huang
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationDynamic index selection in data warehouses
Dynamic index selection in data warehouses Stéphane Azefack 1, Kamel Aouiche 2 and Jérôme Darmont 1 1 Université de Lyon (ERIC Lyon 2) 5 avenue Pierre Mendès-France 69676 Bron Cedex France jerome.darmont@univ-lyon2.fr
More informationDistributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
More informationBig Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
More informationCLANN: Concept Lattice-based Artificial Neural Network for supervised classification
CLANN: Concept Lattice-based Artificial Neural Network for supervised classification Norbert Tsopzé 1,2, Engelbert Mephu Nguifo 2, and Gilbert Tindo 1 1 Université de Yaoundé I, Faculté des Sciences, Département
More informationSPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
More informationMining Large Datasets: Case of Mining Graph Data in the Cloud
Mining Large Datasets: Case of Mining Graph Data in the Cloud Sabeur Aridhi PhD in Computer Science with Laurent d Orazio, Mondher Maddouri and Engelbert Mephu Nguifo 16/05/2014 Sabeur Aridhi Mining Large
More informationIntroducing diversity among the models of multi-label classification ensemble
Introducing diversity among the models of multi-label classification ensemble Lena Chekina, Lior Rokach and Bracha Shapira Ben-Gurion University of the Negev Dept. of Information Systems Engineering and
More informationA Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment
A Clustering Model for Mining Evolving Web User Patterns in Data Stream Environment Edmond H. Wu,MichaelK.Ng, Andy M. Yip,andTonyF.Chan Department of Mathematics, The University of Hong Kong Pokfulam Road,
More informationEnhanced data mining analysis in higher educational system using rough set theory
African Journal of Mathematics and Computer Science Research Vol. 2(9), pp. 184-188, October, 2009 Available online at http://www.academicjournals.org/ajmcsr ISSN 2006-9731 2009 Academic Journals Review
More information270107 - MD - Data Mining
Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 015 70 - FIB - Barcelona School of Informatics 715 - EIO - Department of Statistics and Operations Research 73 - CS - Department of
More informationA Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery
A Serial Partitioning Approach to Scaling Graph-Based Knowledge Discovery Runu Rathi, Diane J. Cook, Lawrence B. Holder Department of Computer Science and Engineering The University of Texas at Arlington
More informationAdvanced Ensemble Strategies for Polynomial Models
Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer
More informationVisualization of large data sets using MDS combined with LVQ.
Visualization of large data sets using MDS combined with LVQ. Antoine Naud and Włodzisław Duch Department of Informatics, Nicholas Copernicus University, Grudziądzka 5, 87-100 Toruń, Poland. www.phys.uni.torun.pl/kmk
More informationCS 6220: Data Mining Techniques Course Project Description
CS 6220: Data Mining Techniques Course Project Description College of Computer and Information Science Northeastern University Spring 2013 General Goal In this project, you will have an opportunity to
More informationStrategic Online Advertising: Modeling Internet User Behavior with
2 Strategic Online Advertising: Modeling Internet User Behavior with Patrick Johnston, Nicholas Kristoff, Heather McGinness, Phuong Vu, Nathaniel Wong, Jason Wright with William T. Scherer and Matthew
More informationHigh-dimensional labeled data analysis with Gabriel graphs
High-dimensional labeled data analysis with Gabriel graphs Michaël Aupetit CEA - DAM Département Analyse Surveillance Environnement BP 12-91680 - Bruyères-Le-Châtel, France Abstract. We propose the use
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationPhilosophies and Advances in Scaling Mining Algorithms to Large Databases
Philosophies and Advances in Scaling Mining Algorithms to Large Databases Paul Bradley Apollo Data Technologies paul@apollodatatech.com Raghu Ramakrishnan UW-Madison raghu@cs.wisc.edu Johannes Gehrke Cornell
More informationEmail Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
More informationFirewall Compressor: An Algorithm for Minimizing Firewall Policies
Firewall Compressor: An Algorithm for Minimizing Firewall Policies Alex Liu, Eric Torng, Chad Meiners Department of Computer Science Michigan State University {alexliu,torng,meinersc}@cse.msu.edu Introduction
More informationVers une Analyse Conceptuelle des Réseaux Sociaux
Vers une Analyse Conceptuelle des Réseaux Sociaux Erick Stattner Martine Collard Laboratory of Mathematics and Computer Science (LAMIA) University of the French West Indies and Guiana, France MARAMI 2012
More informationFUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT MINING SYSTEM
International Journal of Innovative Computing, Information and Control ICIC International c 0 ISSN 34-48 Volume 8, Number 8, August 0 pp. 4 FUZZY CLUSTERING ANALYSIS OF DATA MINING: APPLICATION TO AN ACCIDENT
More informationStatic Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
More informationUniversité de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr
Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr WEKA Gallirallus Zeland) australis : Endemic bird (New Characteristics Waikato university Weka is a collection
More informationISSUES IN RULE BASED KNOWLEDGE DISCOVERING PROCESS
Advances and Applications in Statistical Sciences Proceedings of The IV Meeting on Dynamics of Social and Economic Systems Volume 2, Issue 2, 2010, Pages 303-314 2010 Mili Publications ISSUES IN RULE BASED
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli (alberto.ceselli@unimi.it)
More informationExperiments in Web Page Classification for Semantic Web
Experiments in Web Page Classification for Semantic Web Asad Satti, Nick Cercone, Vlado Kešelj Faculty of Computer Science, Dalhousie University E-mail: {rashid,nick,vlado}@cs.dal.ca Abstract We address
More informationFinding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm
R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*
More informationClustering through Decision Tree Construction in Geology
Nonlinear Analysis: Modelling and Control, 2001, v. 6, No. 2, 29-41 Clustering through Decision Tree Construction in Geology Received: 22.10.2001 Accepted: 31.10.2001 A. Juozapavičius, V. Rapševičius Faculty
More informationEnhancing Quality of Data using Data Mining Method
JOURNAL OF COMPUTING, VOLUME 2, ISSUE 9, SEPTEMBER 2, ISSN 25-967 WWW.JOURNALOFCOMPUTING.ORG 9 Enhancing Quality of Data using Data Mining Method Fatemeh Ghorbanpour A., Mir M. Pedram, Kambiz Badie, Mohammad
More informationHow To Identify A Churner
2012 45th Hawaii International Conference on System Sciences A New Ensemble Model for Efficient Churn Prediction in Mobile Telecommunication Namhyoung Kim, Jaewook Lee Department of Industrial and Management
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationData Outsourcing based on Secure Association Rule Mining Processes
, pp. 41-48 http://dx.doi.org/10.14257/ijsia.2015.9.3.05 Data Outsourcing based on Secure Association Rule Mining Processes V. Sujatha 1, Debnath Bhattacharyya 2, P. Silpa Chaitanya 3 and Tai-hoon Kim
More informationBUSINESS INTELLIGENCE AS SUPPORT TO KNOWLEDGE MANAGEMENT
ISSN 1804-0519 (Print), ISSN 1804-0527 (Online) www.academicpublishingplatforms.com BUSINESS INTELLIGENCE AS SUPPORT TO KNOWLEDGE MANAGEMENT JELICA TRNINIĆ, JOVICA ĐURKOVIĆ, LAZAR RAKOVIĆ Faculty of Economics
More informationENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA
ENSEMBLE DECISION TREE CLASSIFIER FOR BREAST CANCER DATA D.Lavanya 1 and Dr.K.Usha Rani 2 1 Research Scholar, Department of Computer Science, Sree Padmavathi Mahila Visvavidyalayam, Tirupati, Andhra Pradesh,
More informationKNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it
KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:
More informationMining for Web Engineering
Mining for Engineering A. Venkata Krishna Prasad 1, Prof. S.Ramakrishna 2 1 Associate Professor, Department of Computer Science, MIPGS, Hyderabad 2 Professor, Department of Computer Science, Sri Venkateswara
More information131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
More informationVolume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies
Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com
More informationMethodology for Emulating Self Organizing Maps for Visualization of Large Datasets
Methodology for Emulating Self Organizing Maps for Visualization of Large Datasets Macario O. Cordel II and Arnulfo P. Azcarraga College of Computer Studies *Corresponding Author: macario.cordel@dlsu.edu.ph
More informationDATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS
DATA MINING CLUSTER ANALYSIS: BASIC CONCEPTS 1 AND ALGORITHMS Chiara Renso KDD-LAB ISTI- CNR, Pisa, Italy WHAT IS CLUSTER ANALYSIS? Finding groups of objects such that the objects in a group will be similar
More informationDiscovering Functional Dependencies and Association Rules by Navigating in a Lattice of OLAP Views
Discovering Functional Dependencies and Association Rules by Navigating in a Lattice of OLAP Views Pierre Allard, Sébastien Ferré, and Olivier Ridoux IRISA, Université de Rennes 1, Campus de Beaulieu 35042
More informationQuery-Based Multicontexts for Knowledge Base Browsing: an Evaluation
Query-Based Multicontexts for Knowledge Base Browsing: an Evaluation Julien Tane, Phillip Cimiano, and Pascal Hitzler AIFB, Universität Karlsruhe (TH) 76128 Karlsruhe, Germany {jta,pci,hitzler}@aifb.uni-karlsruhe.de
More informationData Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
More informationDATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
More informationSupport Vector Machines with Clustering for Training with Very Large Datasets
Support Vector Machines with Clustering for Training with Very Large Datasets Theodoros Evgeniou Technology Management INSEAD Bd de Constance, Fontainebleau 77300, France theodoros.evgeniou@insead.fr Massimiliano
More informationMachine Learning using MapReduce
Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous
More informationDomain Classification of Technical Terms Using the Web
Systems and Computers in Japan, Vol. 38, No. 14, 2007 Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J89-D, No. 11, November 2006, pp. 2470 2482 Domain Classification of Technical Terms Using
More informationA Survey Study on Monitoring Service for Grid
A Survey Study on Monitoring Service for Grid Erkang You erkyou@indiana.edu ABSTRACT Grid is a distributed system that integrates heterogeneous systems into a single transparent computer, aiming to provide
More informationRANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS
ISBN: 978-972-8924-93-5 2009 IADIS RANKING WEB PAGES RELEVANT TO SEARCH KEYWORDS Ben Choi & Sumit Tyagi Computer Science, Louisiana Tech University, USA ABSTRACT In this paper we propose new methods for
More informationChapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
More informationData Mining: A Preprocessing Engine
Journal of Computer Science 2 (9): 735-739, 2006 ISSN 1549-3636 2005 Science Publications Data Mining: A Preprocessing Engine Luai Al Shalabi, Zyad Shaaban and Basel Kasasbeh Applied Science University,
More informationMODIFIED RECONSTRUCTABILITY ANALYSIS FOR MANY-VALUED FUNCTIONS AND RELATIONS
MODIIED RECONSTRUCTABILITY ANALYSIS OR MANY-VALUED UNCTIONS AND RELATIONS Anas N. Al-Rabadi (1), and Martin Zwick (2) (1) ECE Department (2) Systems Science Ph.D. Program @Portland State University [alrabadi@ece.pdx.edu]
More informationWEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS
WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS Biswajit Biswal Oracle Corporation biswajit.biswal@oracle.com ABSTRACT With the World Wide Web (www) s ubiquity increase and the rapid development
More informationExplanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms
Explanation-Oriented Association Mining Using a Combination of Unsupervised and Supervised Learning Algorithms Y.Y. Yao, Y. Zhao, R.B. Maguire Department of Computer Science, University of Regina Regina,
More informationEchidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis
Echidna: Efficient Clustering of Hierarchical Data for Network Traffic Analysis Abdun Mahmood, Christopher Leckie, Parampalli Udaya Department of Computer Science and Software Engineering University of
More informationOperations Research and Knowledge Modeling in Data Mining
Operations Research and Knowledge Modeling in Data Mining Masato KODA Graduate School of Systems and Information Engineering University of Tsukuba, Tsukuba Science City, Japan 305-8573 koda@sk.tsukuba.ac.jp
More informationGeneral Purpose Database Summarization
Table of Content General Purpose Database Summarization A web service architecture for on-line database summarization Régis Saint-Paul (speaker), Guillaume Raschia, Noureddine Mouaddib LINA - Polytech
More informationMALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph
MALLET-Privacy Preserving Influencer Mining in Social Media Networks via Hypergraph Janani K 1, Narmatha S 2 Assistant Professor, Department of Computer Science and Engineering, Sri Shakthi Institute of
More informationCOMPSCI 760 S2 C 2014 Machine Learning and Data Mining Computer Science Department
COMPSCI 760 S2 C 2014 Machine Learning and Data Mining Computer Science Department Research Projects 2014 This year students will work in groups of 3 on projects. Each project will be a small research
More informationData Quality Mining: Employing Classifiers for Assuring consistent Datasets
Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, fabian.gruening@informatik.uni-oldenburg.de Abstract: Independent
More informationClustering UE 141 Spring 2013
Clustering UE 141 Spring 013 Jing Gao SUNY Buffalo 1 Definition of Clustering Finding groups of obects such that the obects in a group will be similar (or related) to one another and different from (or
More informationTOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM
TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM Thanh-Nghi Do College of Information Technology, Cantho University 1 Ly Tu Trong Street, Ninh Kieu District Cantho City, Vietnam
More informationAbout Universality and Flexibility of FCA-based Software Tools
About Universality and Flexibility of FCA-based Software Tools A.A. Neznanov, A.A. Parinov National Research University Higher School of Economics, 20 Myasnitskaya Ulitsa, Moscow, 101000, Russia ANeznanov@hse.ru,
More informationOptimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2
Optimization of Search Results with Duplicate Page Elimination using Usage Data A. K. Sharma 1, Neelam Duhan 2 1, 2 Department of Computer Engineering, YMCA University of Science & Technology, Faridabad,
More informationClassifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang
Classifying Large Data Sets Using SVMs with Hierarchical Clusters Presented by :Limou Wang Overview SVM Overview Motivation Hierarchical micro-clustering algorithm Clustering-Based SVM (CB-SVM) Experimental
More informationHow To Cluster
Data Clustering Dec 2nd, 2013 Kyrylo Bessonov Talk outline Introduction to clustering Types of clustering Supervised Unsupervised Similarity measures Main clustering algorithms k-means Hierarchical Main
More informationOracle8i Spatial: Experiences with Extensible Databases
Oracle8i Spatial: Experiences with Extensible Databases Siva Ravada and Jayant Sharma Spatial Products Division Oracle Corporation One Oracle Drive Nashua NH-03062 {sravada,jsharma}@us.oracle.com 1 Introduction
More informationApriori-Map/Reduce Algorithm
Apriori-Map/Reduce Algorithm Jongwook Woo Computer Information Systems Department California State University Los Angeles, CA Abstract Map/Reduce algorithm has received highlights as cloud computing services
More informationLoad Balancing in Structured Peer to Peer Systems
Load Balancing in Structured Peer to Peer Systems Dr.K.P.Kaliyamurthie 1, D.Parameswari 2 1.Professor and Head, Dept. of IT, Bharath University, Chennai-600 073. 2.Asst. Prof.(SG), Dept. of Computer Applications,
More informationData Warehousing and Data Mining in Business Applications
133 Data Warehousing and Data Mining in Business Applications Eesha Goel CSE Deptt. GZS-PTU Campus, Bathinda. Abstract Information technology is now required in all aspect of our lives that helps in business
More informationUsing Data Mining for Mobile Communication Clustering and Characterization
Using Data Mining for Mobile Communication Clustering and Characterization A. Bascacov *, C. Cernazanu ** and M. Marcu ** * Lasting Software, Timisoara, Romania ** Politehnica University of Timisoara/Computer
More informationClustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016
Clustering Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016 1 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate data attributes with
More informationAvailable online at www.sciencedirect.com Available online at www.sciencedirect.com. Advanced in Control Engineering and Information Science
Available online at www.sciencedirect.com Available online at www.sciencedirect.com Procedia Procedia Engineering Engineering 00 (2011) 15 (2011) 000 000 1822 1826 Procedia Engineering www.elsevier.com/locate/procedia
More informationDynamical Clustering of Personalized Web Search Results
Dynamical Clustering of Personalized Web Search Results Xuehua Shen CS Dept, UIUC xshen@cs.uiuc.edu Hong Cheng CS Dept, UIUC hcheng3@uiuc.edu Abstract Most current search engines present the user a ranked
More informationUnsupervised Data Mining (Clustering)
Unsupervised Data Mining (Clustering) Javier Béjar KEMLG December 01 Javier Béjar (KEMLG) Unsupervised Data Mining (Clustering) December 01 1 / 51 Introduction Clustering in KDD One of the main tasks in
More informationOLAP Visualization Operator for Complex Data
OLAP Visualization Operator for Complex Data Sabine Loudcher and Omar Boussaid ERIC laboratory, University of Lyon (University Lyon 2) 5 avenue Pierre Mendes-France, 69676 Bron Cedex, France Tel.: +33-4-78772320,
More information